[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: new modules for Unicode normalization
From: |
Pádraig Brady |
Subject: |
Re: new modules for Unicode normalization |
Date: |
Mon, 23 Feb 2009 00:37:53 +0000 |
User-agent: |
Thunderbird 2.0.0.6 (X11/20071008) |
Bruno Haible wrote:
> Pádraig Brady wrote:
>>> I cannot estimate how much of these 10 MB get actually loaded into a
>>> process' working set. ...
>> $ uconv -x NFC&
>> $ sudo bin/ps_mem.py | grep uconv
>> Private + Shared = RAM used Program
>> 1.9 MiB + 788.0 KiB = 2.7 MiB uconv
>
> A great tool!
Cheers :)
It gets a surprising amount of downloads.
I must look at getting it into procps or somewhere.
> Let's see how it compares to a normalization program built
> against gnulib:
>
> $ /arch/x86-linux/inst-icu/3.6/bin/uconv -x NFC &
>
> $ ./unorm NFC &
>
> # ./bin/ps_mem.py |grep uconv
> 2.3 MiB + 126.5 KiB = 2.4 MiB uconv
> # ./bin/ps_mem.py |grep unorm
> 100.0 KiB + 11.5 KiB = 111.5 KiB unorm
>
> So, it uses about 20 times less memory.
Impressive, though uconv provides more functionality
(equiv to iconv+unorm).
> Now let me try to compare the speed (averaged over 100 consecutive runs, to
> eliminate the effects of load dependent CPU frequency scaling):
I usually do `/etc/init.d/cpuspeed stop` before any benchmarking.
> $ time uconv -x NFD < $UCD51/NormalizationTest.txt > /tmp/1
> real 0m0.612s
> $ time ./unorm NFD < $UCD51/NormalizationTest.txt > /tmp/2
> real 0m0.261s
>
> $ time uconv -x NFKC < $UCD51/NormalizationTest.txt > /tmp/1
> real 0m0.598s
> $ time ./unorm NFKC < $UCD51/NormalizationTest.txt > /tmp/2
> real 0m0.309s
>
> So, it's also twice as fast.
>
> The gnulib modules used are:
> uninorm/filter
> uninorm/nfc
> uninorm/nfd
> uninorm/nfkc
> uninorm/nfkd
> unistr/u8-uctomb
> unistr/u8-mbtoucr
That is also impressive. Can you share the code for unorm
so I can do my own testing.
I'm now swayed a bit more in favour of another `unorm` util.
However I'm still 60/40 against. What do others think?
cheers,
Pádraig.
Re: new modules for Unicode normalization, Bruno Haible, 2009/02/21