coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: performance bug of `wc -m`


From: Pádraig Brady
Subject: Re: performance bug of `wc -m`
Date: Fri, 18 May 2018 18:45:46 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 18/05/18 14:06, Eric Fischer wrote:
> For whatever it's worth, the system wcwidth seems to be much faster on my 
> MacOS X system (10.11.6) than the replacement wcwidth. Using the same 
> benchmark as above, it takes about 0.9 seconds with the replacement wcwidth:
> 
> $ yes áááááááááááááááááááá | head -n100000 > mbc.txt
> $ yes 12345678901234567890 | head -n100000 > num.txt
> 
> $ time src/wc -Lm < mbc.txt
> 2100000      20
> real0m1.004s
> 
> $ time src/wc -m < mbc.txt
> 2100000
> real0m0.909s
> 
> $ time src/wc -Lm < num.txt
> 2100000      20
> real0m0.903s
> 
> $ time src/wc -m < num.txt
> 2100000
> real0m0.887s
> 
> and about 0.03 or 0.09 seconds with the system wcwidth (tested by adding 
> return wcwidth (wc); to the top of the lib/wcwidth.c replacement):
> 
> $ time src/wc -Lm < mbc.txt
> 2100000      20
> real0m0.098s
> 
> $ time src/wc -m < mbc.txt
> 2100000
> real0m0.088s
> 
> $ time src/wc -Lm < num.txt
> 2100000      20
> real0m0.038s
> 
> $ time src/wc -m < num.txt
> 2100000
> real0m0.032s
> 
> Unfortunately the replacement wcwidth is probably necessary for correct text 
> measuring. The original MacOS X 10.3 bug where COMBINING ACUTE ACCENT 
> reported a width of 1 instead of 0 appears to be fixed, but two other bugs 
> that the m4/wcwidth.m4 test looks for (HEBREW POINT SHEVA and ZERO WIDTH 
> SPACE reporting widths of 1 instead of 0) appear to still be current.

Interesting.
On the off chance it might have been clang I checked with:

  gl_cv_func_wcwidth_works=no CC=clang ./configure --quiet

and still got fast results with uc_width() on glibc.

Now the gnulib replacement is only table lookup and some bit manipulation.
Ah it also calls locale_charset()!
That must be slow on OSX. Indeed :(

https://lists.gnu.org/archive/html/bug-gnulib/2015-01/msg00040.html
https://lists.gnu.org/archive/html/bug-gnulib/2015-02/msg00000.html

I see some recent improvement (which the latest coreutils git should reference):

https://lists.gnu.org/archive/html/bug-gnulib/2018-04/msg00057.html

It still would be nice to get appropriate caching here.

cheers,
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]