bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#73546: sed 4.9 UTF-8 SMP mismatch on Cygwin


From: Brian Inglis
Subject: bug#73546: sed 4.9 UTF-8 SMP mismatch on Cygwin
Date: Sun, 29 Sep 2024 00:50:01 -0600
User-agent: Mozilla Thunderbird

Hi folks,

I was just trying to compare compose key sequences from RFC1345, as provided in vim "digraphs", X11 in xterm, and also mintty.

While trying to convert X11 Compose Multi_key sequences into di-/tri-/quad-graphs comparable to vim, I found that I could not match some UTF-8 SMP Supplementary Multilingual Plane codepoints > U+FFFF specifically those > U+1F000 using a negated term as in '"[^"]\+"' but '".\+"' worked, as no other '"' appears in any line.

I wondered if this may be a known issue on platforms like Cygwin and others (SunOS?, AIX?) where SMP low/high surrogates are used internally in the library with sizeof(wchar_t) == sizeof(char16_t) != sizeof(wint_t) == sizeof(char32_t), or a bug?

The attached shell script and log demonstrates the issue, using the commonly installed libX11/-common Multi_key Compose sequences data file, the "🄯" U+1F12F COPYLEFT SYMBOL, and mainly normally installed utilities in standard paths, providing some related ancillary information about the data and environment.

The Cygwin environment is up to date as of 2024-09-20 including unifont 16 and last-resort-font 16 with Unicode 16 glyphs.

--
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                -- Antoine de Saint-Exupéry

Attachment: sed-4.9-UTF-8-SMP-mismatch-Cygwin.sh
Description: Text document

Attachment: sed-4.9-UTF-8-SMP-mismatch-Cygwin.log
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]