Re: uniq i18n implementation

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq i18n implementation

From:	Pádraig Brady
Subject:	Re: uniq i18n implementation
Date:	Mon, 14 Aug 2006 10:18:37 +0100
User-agent:	Mozilla Thunderbird 1.0.8 (X11/20060502)

Pádraig Brady wrote:
> Paul Eggert wrote:
> 
>>>>>Using strcoll is inefficient anyway
>>>>
>>>>Don't we know it!  If we can avoid it, we'd like to.
>>>
>>>Well, the mbstowcs+wcscoll solution I presented
>>>should be equivalent to strcoll on any platform,
>>>and it's much faster in my tests.
>>
>>
>>That's good to know, though I'm puzzled as to why it's true.  For a
>>single comparison, can't strcoll typically return an answer without
>>examining all the input, and wouldn't that be faster than
>>mbstowc+wcscoll?
>>
>>But if it is true, perhaps we should rewrite memcoll to use the
>>mbstowc+wcscoll combination as well.
> 
> 
> I missed out a test case in my performance runs
> for same length lines with random data
> (where strcoll can break out early).
> I'll run that and comment more.

1 = my test uniq prog
2 = coreutils 5.97 uniq

a = ascii long lines, with all same length (85 chars), and 26 identical lines 
for every 27
b = ascii long lines, with all same length (85 chars), and all adjacent lines 
different

LANG=en_IE.UTF8

\  1       2
 ---------------
a| 0.466   5.300
b| 0.447   0.438

There seems to be serious overhead with strcoll on glibc-2.3.5-10 at least.

Pádraig.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: uniq i18n implementation, Paul Eggert, 2006/08/08
- Re: uniq i18n implementation, Pádraig Brady, 2006/08/09
  - Re: uniq i18n implementation, Paul Eggert, 2006/08/10
    - Re: uniq i18n implementation, Pádraig Brady, 2006/08/10
    - Re: uniq i18n implementation, Paul Eggert, 2006/08/10
    - Re: uniq i18n implementation, Pádraig Brady <=
    - Re: uniq i18n implementation, Paul Eggert, 2006/08/14
    - Re: uniq i18n implementation, Pádraig Brady, 2006/08/14

Prev by Date: [bug #17427] CVE-2005-1039, chmod race in mkdir, mkfifo, mknod
Next by Date: md5sum fails on DOS formatted files
Previous by thread: Re: uniq i18n implementation
Next by thread: Re: uniq i18n implementation
Index(es):
- Date
- Thread