bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18266: handling bytes not part of the charset, and other garbage


From: Vincent Lefevre
Subject: bug#18266: handling bytes not part of the charset, and other garbage
Date: Sat, 13 Sep 2014 03:17:41 +0200
User-agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25)

On 2014-09-12 17:57:39 -0700, Paul Eggert wrote:
> Currently, for example, the tz package <http://www.iana.org/time-zones> has
> a Make rule 'check_character_set' that verifies that the source files are
> all properly encoded.  It executes this shell command:
> 
> ! grep -nv '^.*$' file names
> 
> This relies on GNU grep's behavior that "." does not match an encoding
> error.  But it's a command that is not obvious.  It'd be simpler and clearer
> to write this:
> 
> ! grep -n '[[:error:]]' file names
> 
> if such a feature were available.

But both of these solutions have the drawback of working only in
UTF-8 locales. One may wonder whether grep is the right tool, as
"iconv -f UTF-8 -t UTF-8" can do such a check in any locale.

-- 
Vincent Lefèvre <address@hidden> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]