bug#18266: handling bytes not part of the charset, and other garbage

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18266: handling bytes not part of the charset, and other garbage

From:	Paul Eggert
Subject:	bug#18266: handling bytes not part of the charset, and other garbage
Date:	Thu, 11 Sep 2014 18:16:29 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

Vincent Lefevre wrote:

the C locale corresponds to ANSI_X3.4-1968,

No it doesn't, at least not on any current platform I'm aware of. AndPOSIX does not require that. POSIX even allows the C locale to bemultibyte, e.g., UTF-8.

I would say that this should be the same for invalid
byte sequences in a UTF-8 locale.

One *could* design an encoding with that property, but it wouldn't beUTF-8; it would be something else. I don't know of any C library thatdoes that to UTF-8. There are good arguments against doing it, e.g.,one loses the property that one can concatenate character strings byconcatenating their byte representations.

Anyway I'm afraid we may be going off the deep end here. After all,grep can't impose its coding system design onto the operating system;it's more the other way around.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error, (continued)

Prev by Date: bug#18266: handling bytes not part of the charset, and other garbage
Next by Date: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Previous by thread: bug#18266: handling bytes not part of the charset, and other garbage
Next by thread: bug#18266: handling bytes not part of the charset, and other garbage
Index(es):
- Date
- Thread