[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uniq i18n implementation
From: |
Pádraig Brady |
Subject: |
Re: uniq i18n implementation |
Date: |
Mon, 14 Aug 2006 10:18:37 +0100 |
User-agent: |
Mozilla Thunderbird 1.0.8 (X11/20060502) |
Pádraig Brady wrote:
> Paul Eggert wrote:
>
>>>>>Using strcoll is inefficient anyway
>>>>
>>>>Don't we know it! If we can avoid it, we'd like to.
>>>
>>>Well, the mbstowcs+wcscoll solution I presented
>>>should be equivalent to strcoll on any platform,
>>>and it's much faster in my tests.
>>
>>
>>That's good to know, though I'm puzzled as to why it's true. For a
>>single comparison, can't strcoll typically return an answer without
>>examining all the input, and wouldn't that be faster than
>>mbstowc+wcscoll?
>>
>>But if it is true, perhaps we should rewrite memcoll to use the
>>mbstowc+wcscoll combination as well.
>
>
> I missed out a test case in my performance runs
> for same length lines with random data
> (where strcoll can break out early).
> I'll run that and comment more.
1 = my test uniq prog
2 = coreutils 5.97 uniq
a = ascii long lines, with all same length (85 chars), and 26 identical lines
for every 27
b = ascii long lines, with all same length (85 chars), and all adjacent lines
different
LANG=en_IE.UTF8
\ 1 2
---------------
a| 0.466 5.300
b| 0.447 0.438
There seems to be serious overhead with strcoll on glibc-2.3.5-10 at least.
Pádraig.