[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#73546: sed 4.9 UTF-8 SMP mismatch on Cygwin
From: |
Brian Inglis |
Subject: |
bug#73546: sed 4.9 UTF-8 SMP mismatch on Cygwin |
Date: |
Sun, 29 Sep 2024 00:50:01 -0600 |
User-agent: |
Mozilla Thunderbird |
Hi folks,
I was just trying to compare compose key sequences from RFC1345, as provided in
vim "digraphs", X11 in xterm, and also mintty.
While trying to convert X11 Compose Multi_key sequences into
di-/tri-/quad-graphs comparable to vim, I found that I could not match some
UTF-8 SMP Supplementary Multilingual Plane codepoints > U+FFFF specifically
those > U+1F000 using a negated term as in '"[^"]\+"' but '".\+"' worked, as no
other '"' appears in any line.
I wondered if this may be a known issue on platforms like Cygwin and others
(SunOS?, AIX?) where SMP low/high surrogates are used internally in the library
with sizeof(wchar_t) == sizeof(char16_t) != sizeof(wint_t) == sizeof(char32_t),
or a bug?
The attached shell script and log demonstrates the issue, using the commonly
installed libX11/-common Multi_key Compose sequences data file, the "🄯" U+1F12F
COPYLEFT SYMBOL, and mainly normally installed utilities in standard paths,
providing some related ancillary information about the data and environment.
The Cygwin environment is up to date as of 2024-09-20 including unifont 16 and
last-resort-font 16 with Unicode 16 glyphs.
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
sed-4.9-UTF-8-SMP-mismatch-Cygwin.sh
Description: Text document
sed-4.9-UTF-8-SMP-mismatch-Cygwin.log
Description: Text document
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#73546: sed 4.9 UTF-8 SMP mismatch on Cygwin,
Brian Inglis <=