bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: library for unicode collation in C for texi2any?


From: Eli Zaretskii
Subject: Re: library for unicode collation in C for texi2any?
Date: Sat, 14 Oct 2023 22:20:40 +0300

> From: Gavin Smith <gavinsmith0123@gmail.com>
> Date: Sat, 14 Oct 2023 19:57:22 +0100
> 
> It's all in the future, but I am slightly concerned about is duplicating
> in Texinfo existing system facilities.  For example, for avoiding use of
> wcwidth, our use of which depends on setting a UTF-8 locale, and using
> the wchar_t type.  Is every program that uses wcwidth supposed to supply
> their own implementation instead, and isn't this wasteful?

What other locale-specific functions do we need in addition to
wcwidth?

If the list of those functions is short enough, we could replace them
all by the corresponding Gnulib/libunistring functions, and then we
could stop setting specific locales and relying on locale-specific
libc functions.  That will give us locale-independent code which will
work on all systems.

> I don't know if libunistring aspires to become a standard system library
> for handling UTF-8 data but if we use it for other UTF-8 processing it
> would make sense to use it for collation.
> 
> I suggest writing to Bruno Haible to ask if he has plans to include
> collation functionality in libunistring in the future.  I am currently
> reading through "Unicode Technical Standard #10" and although I don't
> understand a lot of it yet, it seems feasible that we could implement it
> in C.

It is feasible, but implementing it from scratch is a lot of work, and
needs a large database (which we could take from the CLDR).  But note
that CLDR is AFAIK locale-dependent; the only part of it that doesn't
depend on the locale is collation by Unicode codepoints.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]