[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug in basename
From: |
Bruno Haible |
Subject: |
Re: bug in basename |
Date: |
Fri, 15 May 2009 09:11:28 +0200 |
User-agent: |
KMail/1.9.9 |
Ondrej Bilka wrote:
> For encodings like BIG5 if character contains / it could quit prematurely.
Eric Blake wrote:
> BIG5 is a lousy character encoding for the very
> reason that it confuses common ASCII bytes with encoded characters,
> depending on shift state.
BIG5 does not have shift state. BIG5 is a stateless multibyte encoding,
composed of two character sets:
first byte second byte
0x00..0x7F (ASCII)
0xA1..0xFE 0x40..0x7E,0xA1..0xFE (BIG5)
The '/' is not among the range of allowed byte values for the second
byte. Therefore strchr(s,'/') and strrchr(s,'/') work fine also in BIG5
encoded strings.
> character encodings dependent on your locale (except on Mac, and look at the
> problems that caused)
The current problem with filename on MacOS X is that the underlying filesystem,
HFS+, stores filenames in decomposed Unicode. I.e. when the user creates a
file with a filename with accents (precomposed Unicode, as usual), the file
that gets created has a different name, its decomposed Unicode form. This is
quite annoying because
- the file name that one can retrieve with "ls" is different from the
specified file name,
- it goes against the Character Model of the W3C [1], which recommends
NFC (not NFD) normalization.
Bruno
[1] http://www.w3.org/TR/charmod-norm/