[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Wildcard expansion can fail with nonprinting characters
From: |
Stephane Chazelas |
Subject: |
Re: Wildcard expansion can fail with nonprinting characters |
Date: |
Tue, 1 Oct 2019 07:44:20 +0100 |
User-agent: |
NeoMutt/20171215 |
2019-09-30 15:35:21 -0400, Chet Ramey:
[...]
> The $'\361' is a unicode combining
> character, which ends up making the entire sequence of characters an
> invalid wide character string in a bunch of different locales.
[...]
No, $'\u0361', the unicode character 0x361 (hex) is "COMBINING
DOUBLE INVERTED BREVE" (encoded as \315\241 in UTF-8)
But $'\361' is byte value 0361 (octal). In UTF-8, on its own
it's an invalid byte sequence. That's 2#11110001, which would be
the first byte of a 4 byte-long character (of characters U+40000
to U+7FFFF). In latin1, that's ñ (LATIN SMALL LETTER N WITH
TILDE).
So $'foo\361bar' is not text in UTF-8, but that's an encoding
issue, not a problem with combining characters.
$ locale charmap
UTF-8
$ printf '\u361' | od -An -to1
315 241
$ printf '\U40000' | od -An -vto1
361 200 200 200
$ printf 'foo\361bar' | iconv -f utf8
fooiconv: illegal input sequence at position 3
--
Stephane
- Re: Wildcard expansion can fail with nonprinting characters,
Stephane Chazelas <=