[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: new modules for Unicode normalization
From: |
Pádraig Brady |
Subject: |
Re: new modules for Unicode normalization |
Date: |
Sun, 22 Feb 2009 00:44:42 +0000 |
User-agent: |
Thunderbird 2.0.0.6 (X11/20071008) |
Jim Meyering wrote:
> Bruno Haible wrote:
> ...
>> With this, you can easily create a program that reads UTF-8 from stdin and
>> outputs it as canonicalized UTF-8 on stdout:
>> - create a "stream" that takes a Unicode character and outputs it to
>> stdout. (Gnulib module 'unistr/u8-uctomb'.)
>> - Wrap a Unicode normalizing filter around it. (Gnulib module
>> 'uninorm/filter'.)
>> - Feed it with Unicode characters from standard input. (Gnulib module
>> unistr/u8-mbtouc'.)
>>
>> I would love to see such a program in coreutils. But I am not a coreutils
>> maintainer.
>
> Hi Bruno,
>
> That sounds like it'd make a fine addition, and you're welcome to
> contribute it. Anyone can contribute, assuming they assign copyright.
> And you did that for coreutils back before it was called that ;-)
It might be an idea for me to do it, since I know the details
of adding new programs to coreutils, and also I need to get
to know the unicode APIs in gnulib for further i18n work in coreutils.
I've not had much time for anything lately, but I would hope
to do that next week if possible.
So I'm wondering now why normalization functionality isn't in iconv?
Seems like a big ommision to me. There is a mention of it here:
http://www.archivum.info/address@hidden/2006-08/msg00004.html
Then I also noticed `uconv` which is in the "icu" package of fedora at least.
To normalize text the following worked for me:
uconv -x NFC < test.utf8
So iconv may get this in future and uconv already has it.
Do we really need another util in coreutils for this?
cheers,
Pádraig.
Re: new modules for Unicode normalization, Bruno Haible, 2009/02/21