[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: new modules for Unicode normalization
From: |
Bruno Haible |
Subject: |
Re: new modules for Unicode normalization |
Date: |
Sun, 22 Feb 2009 18:21:36 +0100 |
User-agent: |
KMail/1.9.9 |
Pádraig Brady wrote:
> > I cannot estimate how much of these 10 MB get actually loaded into a
> > process' working set. ...
>
> $ uconv -x NFC&
> $ sudo bin/ps_mem.py | grep uconv
> Private + Shared = RAM used Program
> 1.9 MiB + 788.0 KiB = 2.7 MiB uconv
A great tool! Let's see how it compares to a normalization program built
against gnulib:
$ /arch/x86-linux/inst-icu/3.6/bin/uconv -x NFC &
$ ./unorm NFC &
# ./bin/ps_mem.py |grep uconv
2.3 MiB + 126.5 KiB = 2.4 MiB uconv
# ./bin/ps_mem.py |grep unorm
100.0 KiB + 11.5 KiB = 111.5 KiB unorm
So, it uses about 20 times less memory.
Now let me try to compare the speed (averaged over 100 consecutive runs, to
eliminate the effects of load dependent CPU frequency scaling):
$ time uconv -x NFD < $UCD51/NormalizationTest.txt > /tmp/1
real 0m0.612s
user 0m0.597s
sys 0m0.014s
$ time ./unorm NFD < $UCD51/NormalizationTest.txt > /tmp/2
real 0m0.261s
user 0m0.247s
sys 0m0.014s
$ time uconv -x NFKC < $UCD51/NormalizationTest.txt > /tmp/1
real 0m0.598s
user 0m0.583s
sys 0m0.014s
$ time ./unorm NFKC < $UCD51/NormalizationTest.txt > /tmp/2
real 0m0.309s
user 0m0.297s
sys 0m0.011s
So, it's also twice as fast.
The gnulib modules used are:
uninorm/filter
uninorm/nfc
uninorm/nfd
uninorm/nfkc
uninorm/nfkd
unistr/u8-uctomb
unistr/u8-mbtoucr
Bruno
Re: new modules for Unicode normalization, Bruno Haible, 2009/02/21