[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: library for unicode collation in C for texi2any?
From: |
Eli Zaretskii |
Subject: |
Re: library for unicode collation in C for texi2any? |
Date: |
Thu, 12 Oct 2023 18:13:34 +0300 |
> Date: Thu, 12 Oct 2023 15:00:57 +0200
> From: Patrice Dumas <pertusus@free.fr>
> Cc: bug-texinfo@gnu.org
>
> On Thu, Oct 12, 2023 at 01:29:27PM +0300, Eli Zaretskii wrote:
> > What is "smart sorting"? where is it described/documented?
>
> It is, in general, any way to sort Unicode that takes into account
> natural languages words orders. In practice, what is used in
> Unicode::Collate is the 'Unicode Technical Standard #10' Unicode
> Collation Algorithm (a.k.a. UCA) described in
> http://www.unicode.org/reports/tr10. In texi2any, we set an option of
> collation,
> ( 'variable' => 'Non-Ignorable' )
> such that spaces and punctuation marks sort before letters. This
> specific option is described in
> http://www.unicode.org/reports/tr10/#Variable_Weighting
>
> It would be perfect if the same sorting could be obtained, but if
> C code does not follow exactly the same standard, I do not think
> that it is so problematic, as long as the sorting is sensible. It could
> actually be problematic for tests, but if the output of texi2any is ok
> even if not fully reproducible, it would still be better than sorting
> according to the Unicode codepoint in a full C implementation.
What you say is not detailed enough, but using my crystal ball I think
you can have this with glibc-based systems, and also on Windows (but
that requires using a special API for comparing strings). Not sure
about the equivalent features on other systems, like *BSD and macOS.
You can see that in action in how GNU 'ls' sorts file names.
> > In general, Unicode collation rules are locale- and
> > language-dependent. My recommendation for Texinfo is not to use
> > locale-specific collation rules, so that the indices would come out
> > sorted identically no matter in which locale the user runs texi2any.
>
> That's the plan. The plan is to use the @documentlanguage information
> with Unicode::Collate::Locale in the future, but never use the locale.
I don't recommend to tailor index sorting for the language indicated
by @documentlanguage, either.
> This is still a TODO item, though, as Unicode::Collate::Locale is a perl
> core module since perl 5.14 only, released in 2011, so my plan was to
> wait for 2031 to use it and be able to assume that it is indeed present
> the same way we assume that Unicode::Collate is present.
We can have this in C today.
- library for unicode collation in C for texi2any?, Patrice Dumas, 2023/10/12
- Re: library for unicode collation in C for texi2any?, Eli Zaretskii, 2023/10/12
- Re: library for unicode collation in C for texi2any?, Patrice Dumas, 2023/10/12
- Re: library for unicode collation in C for texi2any?,
Eli Zaretskii <=
- Re: library for unicode collation in C for texi2any?, Werner LEMBERG, 2023/10/12
- Re: library for unicode collation in C for texi2any?, Eli Zaretskii, 2023/10/12
- Re: library for unicode collation in C for texi2any?, Werner LEMBERG, 2023/10/12
- Re: library for unicode collation in C for texi2any?, Eli Zaretskii, 2023/10/13
- Re: library for unicode collation in C for texi2any?, Werner LEMBERG, 2023/10/13
- Re: library for unicode collation in C for texi2any?, Eli Zaretskii, 2023/10/13
- Re: library for unicode collation in C for texi2any?, Werner LEMBERG, 2023/10/13
- Re: library for unicode collation in C for texi2any?, Eli Zaretskii, 2023/10/13
- Re: library for unicode collation in C for texi2any?, Werner LEMBERG, 2023/10/13
- Re: library for unicode collation in C for texi2any?, Eli Zaretskii, 2023/10/13