|
From: | Paolo Bonzini |
Subject: | Re: removing blank lines: "grep ." is really slow |
Date: | Fri, 16 Apr 2010 09:37:09 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100301 Fedora/3.0.3-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.3 |
On 04/16/2010 02:04 AM, Ivan wrote:
I used to use grep . for removing blank lines, until I realized how slow it is for large numbers of lines. So I switched to grep -v '^$' , which is as fast as one would expect (well, not with the grep that comes with MacOSX 10.5.8 (GNU grep version 2.5.1), but this seems to have been fixed sometime between 2.5.1 and 2.6.3).
True. You'd need to expand UTF-8 period characters to the appropriate character sets, then you can use the faster single-byte character set matcher. It's on my todo list.
It wouldn't be exactly as fast as your grep -v solution (which is optimal and preferred) however, because it will check that a character in the line is a valid UTF-8 character. In particular it would be slow and have false negatives if you're document is not UTF-8.
You can also use "LC_ALL=C grep .", that would be fast and exactly equivalent to "grep -v '^$'".
Paolo
[Prev in Thread] | Current Thread | [Next in Thread] |