|
| From: | Paul Eggert |
| Subject: | Re: From wchar_t to char32_t |
| Date: | Sun, 2 Jul 2023 09:55:01 -0700 |
| User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 |
On 2023-07-02 06:33, Bruno Haible wrote:
+ else if (bytes == (size_t) -3) + bytes = 0;
Why is this sort of thing needed? I thought that (size_t) -3 was possible only after a low surrogate, which is possible when decoding valid UTF-16 to Unicode, but not when decoding valid UTF-8 to Unicode. When can we get (size_t) -3 in a real-world system?
If (size_t) -3 is possible, I suppose I should change diffutils to take this into account, as bleeding-edge diffutils/src/side.c treats (size_t) -3 as meaning the next input byte is an encoding error, which is obviously wrong. The simplest way to fix this would be for diffutils to go back to using wchar_t, although I don't know what the downsides of that would be (diffutils doesn't care about Unicode; all it cares is about is character classes and print widths).
| [Prev in Thread] | Current Thread | [Next in Thread] |