|
From: | Julian Foad |
Subject: | Re: say if grep can find non-ascii |
Date: | Wed, 08 Mar 2006 12:22:48 +0000 |
User-agent: | Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511 |
Paul Eggert wrote:
For this particular task "grep for non-ASCII characters", I had just two days before tried to solve the same problem, and discovered that 'grep', somewhat to my surprise, can't do it. This is worth either mentioning or fixing, in my opinion.
According to the Open Group spec for Regular Expressions, which is a standard I assume we should generally be following,
<http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03_05>
The following character class expressions shall be supported in all locales: [:alnum:] [:cntrl:] [:lower:] [:space:] [:alpha:] [:digit:] [:print:] [:upper:] [:blank:] [:graph:] [:punct:] [:xdigit:] In addition, character class expressions of the form: [:name:] are recognized in those locales where the name keyword has been given a charclass definition in the LC_CTYPE category.
Therefore "grep '[^[:ascii:]]'" ought to work as expected iff the current locale defines that class.
Whether it DOES work is something I haven't tried to determine.Whether Grep should support that class unconditionally, as Perl does, is another matter. I'd say probably not; there's probably a reason why it's not in the list of standard classes.
The Grep manual should be more explicit about the use of character classes other than those that it says are supported.
- Julian
[Prev in Thread] | Current Thread | [Next in Thread] |