bug#18266: handling bytes not part of the charset, and other garbage

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18266: handling bytes not part of the charset, and other garbage

From:	Paul Eggert
Subject:	bug#18266: handling bytes not part of the charset, and other garbage
Date:	Thu, 11 Sep 2014 09:22:49 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0

Vincent Lefevre wrote:

There's no reason that '.' matches something that doesn't belong to
the charset in C locale, but doesn't match in a UTF-8 locale.

In the C locale on GNU/Linux, all byte values are members of thecharset. That is why it's OK for '.' to accept that byte in the Clocale but reject it in a UTF-8 locale.

It's annoying that now in UTF-8, one can no longer match ISO-8859-1 text

This has been true for quite some time in 'grep', at least with thestandard matchers. It may not have been true for -P but that relied onundefined behavior that could crash grep, and we can't have that.

It would make sense to add a notation to mean "match any character orinvalid byte", as an extension. That'd take some work, though.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#18266: grep -P and invalid exits with error, (continued)

Prev by Date: bug#18425: test for new glibc regex bug
Next by Date: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error
Previous by thread: bug#18266: handling bytes not part of the charset, and other garbage (was: grep -P and invalid exits with error)
Next by thread: bug#18266: handling bytes not part of the charset, and other garbage
Index(es):
- Date
- Thread