bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6600: [PATCH] sort: add --threads option to parallelize internal sor


From: Pádraig Brady
Subject: bug#6600: [PATCH] sort: add --threads option to parallelize internal sort.
Date: Sat, 10 Jul 2010 10:29:03 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3

On 10/07/10 05:23, Chen Guo wrote:
> 2010/7/9 Paul Eggert <address@hidden>:
>> On 07/09/10 18:07, Pádraig Brady wrote:
>>> Chen Guo wrote:
>>>> That happened when more than one instance of memcoll is called on the same
>>>> line at once, since memcoll replaces the eolchar with '\0'. Under our 
>>>> approach,
>>>> the same line shouldn't ever be compared at the same time, so we're fine.
>>
>> Ah, sorry, I wasn't aware of that.
>>
>>> I'm thinking of dropping
>>> the whole xmemcoll0() thing altogether assuming your
>>> statement above is correct, that a particular line will
>>> not be used at the same time by multiple threads.
>>
>> Yes, that makes sense.  We can revert that change from gnulib, since it
>> makes gnulib bigger unnecessarily.
>>
> 
> Actually, the '\0' saves about 5% off runtime last I checked. This is because
> EACH TIME sort compares two lines memcoll would replace the last byte. If we
> set them all to NUL anyway at the start, memcoll_nul wouldn't need to do that
> replacement for each compare. When we output, we'd simply put the \n back.
> 
> I could be wrong though, this is going off memory from 4-5 months ago. But 5%
> is about what I remember, when sorting 1M lines on 8 cores.

Well for the whole line comparison where it works it's 2.9% faster
(or 2.3% adding in the NUL checks to xmemcoll0).
Also xmemcoll0() is probably generally useful since it's a readonly function.
So I'll leave it and fix up the keycompare() calls to it
(while documenting that keycompare() needs to work on a particular
line in line one thread.

thanks,
Pádraig.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]