bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: From wchar_t to char32_t


From: Paul Eggert
Subject: Re: From wchar_t to char32_t
Date: Sun, 2 Jul 2023 09:55:01 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

On 2023-07-02 06:33, Bruno Haible wrote:
+                    else if (bytes == (size_t) -3)
+                      bytes = 0;

Why is this sort of thing needed? I thought that (size_t) -3 was possible only after a low surrogate, which is possible when decoding valid UTF-16 to Unicode, but not when decoding valid UTF-8 to Unicode. When can we get (size_t) -3 in a real-world system?

If (size_t) -3 is possible, I suppose I should change diffutils to take this into account, as bleeding-edge diffutils/src/side.c treats (size_t) -3 as meaning the next input byte is an encoding error, which is obviously wrong. The simplest way to fix this would be for diffutils to go back to using wchar_t, although I don't know what the downsides of that would be (diffutils doesn't care about Unicode; all it cares is about is character classes and print widths).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]