[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 in path / filename
From: |
Peter Dyballa |
Subject: |
Re: UTF-8 in path / filename |
Date: |
Sun, 27 Aug 2006 15:12:15 +0200 |
Am 27.08.2006 um 00:13 schrieb James Cloos:
Peter> Files with UTF-8 characters in them are shown in dired (has -
u: in
Peter> mode-line, i.e. uses UTF-8) à la <vowel><empty box>. Some
UTF-8
Peter> characters like ß or Û show up as themselves.
Doesn't apple by default use NFD (Normalizaion Form Decomposed) for
filenames? That would explain the <vowel><box> sequences.
Yes, that's the correct term for the way file names are recorded in
HFS+.
The font file, LucidaTypewriterRegular.ttf, has no combining
diacritical marks defined (only some modifiers), so these empty boxes
are displayed instead.
Can you get at the actual octet-sequence of the filenames?
Do you know a tool that can do that? I can only think of a C
programme that reads the inode and than outputs the octets. Doing the
same as Harald did I get in Terminal different output (because UTF-8
characters are substituted with question marks, for example:
pete 140 /\ l -1 | grep .txt | grep ' ' | grep -v Mac
RGB äöüæÆÜÖÄ.txt
pete 141 /\ l -1 | grep .txt | grep ' ' | grep -v Mac | od -t a
R G B sp a ? 88 o ? 88 u ? 88 ? ? ?
86 U ? 88 O ? 88 A ? 88 . t x t nl
In Emacsen' shells I get:
R G B sp a \314 88 o \314 88 u \314 88
\303 \246 \303
86 U \314 88 O \314 88 A \314 88 . t x t nl
The file name áÛïǓà.txt is interpreted as:
a \314 81 U \314 82 i \314 88 U \314 8c a
\314 80 .
t x t nl
--
Greetings
Pete
"Isn't vi that text editor with two modes... one that beeps and one
that corrupts your file?" -- Dan Jacobson, on comp.os.linux.advocacy
- UTF-8 in path / filename, Grégory SCHMITT, 2006/08/24
- Re: UTF-8 in path / filename, Noah Slater, 2006/08/24
- Re: UTF-8 in path / filename, Peter Dyballa, 2006/08/25
- Message not available
- Re: UTF-8 in path / filename, Grégory SCHMITT, 2006/08/25
- Re: UTF-8 in path / filename, Peter Dyballa, 2006/08/25
- Re: UTF-8 in path / filename, Grégory SCHMITT, 2006/08/25
- Re: UTF-8 in path / filename, Peter Dyballa, 2006/08/25
- Message not available
- Re: UTF-8 in path / filename, Miles Bader, 2006/08/25
- Re: UTF-8 in path / filename, Peter Dyballa, 2006/08/26
- Re: UTF-8 in path / filename, James Cloos, 2006/08/26
- Re: UTF-8 in path / filename,
Peter Dyballa <=
- Re: UTF-8 in path / filename, James Cloos, 2006/08/28
- Re: UTF-8 in path / filename, Peter Dyballa, 2006/08/28
- Message not available
- Re: UTF-8 in path / filename, Harald Hanche-Olsen, 2006/08/27
- Message not available
- Re: UTF-8 in path / filename, Grégory SCHMITT, 2006/08/25
- Message not available
- Message not available
- Re: UTF-8 in path / filename, Grégory SCHMITT, 2006/08/25
- Re: UTF-8 in path / filename, Miles Bader, 2006/08/25