help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 in path / filename


From: Peter Dyballa
Subject: Re: UTF-8 in path / filename
Date: Sun, 27 Aug 2006 15:12:15 +0200


Am 27.08.2006 um 00:13 schrieb James Cloos:

Peter> Files with UTF-8 characters in them are shown in dired (has - u: in Peter> mode-line, i.e. uses UTF-8) à la <vowel><empty box>. Some UTF-8
Peter> characters like ß or Û show up as themselves.

Doesn't apple by default use NFD (Normalizaion Form Decomposed) for
filenames?  That would explain the <vowel><box> sequences.

Yes, that's the correct term for the way file names are recorded in HFS+.

The font file, LucidaTypewriterRegular.ttf, has no combining diacritical marks defined (only some modifiers), so these empty boxes are displayed instead.


Can you get at the actual octet-sequence of the filenames?

Do you know a tool that can do that? I can only think of a C programme that reads the inode and than outputs the octets. Doing the same as Harald did I get in Terminal different output (because UTF-8 characters are substituted with question marks, for example:

        pete 140 /\ l -1 | grep .txt | grep ' ' | grep -v Mac
        RGB äöüæÆÜÖÄ.txt
        pete 141 /\ l -1 | grep .txt | grep ' ' | grep -v Mac | od -t a
            R   G   B  sp   a   ?  88   o   ?  88   u   ?  88   ?   ?   ?
           86   U   ?  88   O   ?  88   A   ?  88   .   t   x   t  nl

In Emacsen' shells I get:

R G B sp a \314 88 o \314 88 u \314 88 \303 \246 \303
           86   U   \314  88   O   \314  88   A   \314  88   .   t   x   t  nl

The file name áÛïǓà.txt is interpreted as:

a \314 81 U \314 82 i \314 88 U \314 8c a \314 80 .
            t   x   t  nl

--
Greetings

  Pete

"Isn't vi that text editor with two modes... one that beeps and one
that corrupts your file?" -- Dan Jacobson, on comp.os.linux.advocacy







reply via email to

[Prev in Thread] Current Thread [Next in Thread]