bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18266: handling bytes not part of the charset, and other garbage


From: Vincent Lefevre
Subject: bug#18266: handling bytes not part of the charset, and other garbage
Date: Fri, 12 Sep 2014 03:41:24 +0200
User-agent: Mutt/1.5.23-6361-vl-r59709 (2014-07-25)

On 2014-09-11 18:16:29 -0700, Paul Eggert wrote:
> Vincent Lefevre wrote:
> >the C locale corresponds to ANSI_X3.4-1968,
> 
> No it doesn't, at least not on any current platform I'm aware of.

It does on Debian:

ypig% LC_ALL=C locale charmap
ANSI_X3.4-1968

> >I would say that this should be the same for invalid
> >byte sequences in a UTF-8 locale.
> 
> One *could* design an encoding with that property, but it wouldn't be UTF-8;
> it would be something else.  I don't know of any C library that does that to
> UTF-8.  There are good arguments against doing it, e.g., one loses the
> property that one can concatenate character strings by concatenating their
> byte representations.

I'm talking only about grep here.

BTW, the current behavior breaks the sometimes used "grep ." solution
to match non-empty lines.

-- 
Vincent Lefèvre <address@hidden> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]