bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep


From: Petr Pajas
Subject: grep
Date: Mon, 12 Jul 2004 12:27:50 +0200
User-agent: Gnus/5.110002 (No Gnus v0.2) Emacs/21.3 (gnu/linux)

Hi folks,

I'm using grep to extract lines that start with '15' from a file
approx.  15MB in size. On a 3GHz Linux box it run for 1m30s. I found
that it was due to UTF-8 locales. If I switch to 8bit locales, it only
takes a fraction of a second. Strangely, it also takes only about 2s
if used with UTF-8 locales, but searching for lines that *contain* 15,
not only begin with.

$ grep --version
grep (GNU grep) 2.5.1

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ LC_CTYPE=en_US.UTF-8 time grep '^15' u0057.lst >/dev/null
73.46user 0.19system 1:18.93elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (173major+61minor)pagefaults 0swaps

$ LC_CTYPE=en_US time grep '^15' u0057.lst >/dev/null
0.05user 0.02system 0:00.13elapsed 51%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (163major+37minor)pagefaults 0swaps

$ LC_CTYPE=en_US.UTF-8 time grep '15' u0057.lst >/dev/null
1.84user 0.01system 0:01.91elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (170major+53minor)pagefaults 0swaps

$ LC_CTYPE=en_US time grep '15' u0057.lst >/dev/null
0.07user 0.00system 0:00.13elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (160major+36minor)pagefaults 0swaps

These results make me believe there is something odd in the
implementation of either locale support or of '^'.

Thanks,

-- Petr

Attachment: pgpQniVXEXEP4.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]