[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LC_COLLATE in the C locale
From: |
Paul Eggert |
Subject: |
Re: LC_COLLATE in the C locale |
Date: |
Wed, 18 Dec 2019 08:27:02 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 |
On 12/18/19 2:29 AM, Bruno Haible wrote:
> Hi Paul,
>
>> I do have a qualm in that coreutils (and I assume others) interpret
>> !hard_locale
>> (LC_COLLATE) as meaning that the locale is unibyte and uses native byte
>> comparison.
> Isn't this warranted by section "LC_COLLATE Category in the POSIX Locale" in
> <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html> ?
I don't see where that section requires unibyte.
>> As I recall on some platforms (macOS maybe?), the C locale uses
>> UTF-8 so this interpretation isn't correct.
> UTF-8 has the nice property that byte-per-byte comparison and codepoint-per-
> codepoint comparison are equivalent.
True, so the code that assumes strcmp == strcoll should work. But I think some
code specifically assumes unibyte. Presumably that code should also check
MB_CUR_MAX, which should be enough in practice (even though it doesn't suffice
in theory).