grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Locale aware range expressions?


From: Paul Eggert
Subject: Re: Locale aware range expressions?
Date: Sun, 28 Jan 2024 23:07:47 -0800
User-agent: Mozilla Thunderbird

On 2024-01-28 21:00, Ronan Pigott wrote:
it sounds
like the collation sequence referred to by that document, which defines the
sort order for sort, strcoll, strxfrm etc., is the same one referred to by the
grep(1) manual.

They're not the same, as sort/strcoll/etc. use weights and collation sequences do not. Weighting explains why 'B' sorts between 'a' and 'c' even though 'B' is not between 'a' and 'c' in the collation sequence. (Here I'm talking about the en_US.utf8 locale in GNU/Linux; things can differ in elsewhere.)


how can I
characterize the full set of characters which are matched by '[a-d]'?

Assuming your grep is using glibc regex code, you can look at glibc's source code. See libc/localedata/locales/en_US and the files it includes directly or indirectly, notably libc/localedata/locales/iso14651_t1_common where you should look for LATIN SMALL LETTER A to see the collating sequence for the following letters. If I've calculated things correctly, [a-d] is equivalent to [aᴀⱥᶏᴁᴂꬱɐɑꬰᶐɒꭤbʙƀᴯᴃᵬꞗᶀɓƃꞵcᴄȼꞓꞔƈɕↄꜿd] in the GNU/Linux en_US.utf8 locale when using glibc regex matching.

In looking over this thread, part of the problem is that the grep man page (which is not as carefully maintained as the grep manual) is out-of-date with respect to the grep manual, and part of the problem is that the grep manual itself is using unclear terminology. I installed the attached patch to try to improve the documentation to be clearer and to match current behavior better. (In this patch I resisted the temptation of putting [aᴀⱥᶏᴁᴂꬱɐɑꬰᶐɒꭤbʙƀᴯᴃᵬꞗᶀɓƃꞵcᴄȼꞓꞔƈɕↄꜿd] in the manual, as too many output devices would have trouble with it.)

Attachment: 0001-Improve-doc-for-range-expressions.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]