emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multibyte and unibyte file names


From: Stephen J. Turnbull
Subject: Re: Multibyte and unibyte file names
Date: Sat, 26 Jan 2013 12:04:50 +0900

Eli Zaretskii writes:

 > We are crippled.

Appendicitis feels that way while you have it.  Cut out the inflamed
appendix and in a couple days you are as functional as ever.

"Unibyte" as implemented in Emacs is a premature optimization, and a
disaster in search of places to happen.  Remove it, and you'll never
notice it's gone.  The consequence of that removal would be to fix
this problem, permanently.

As Stefan says, there would remain a more general problem that -- with
the exception of Windows Unicode APIs -- that there is no absolutely
reliable way of determining the user's intended encoding.  However,
the only important cases where this interferes with usual filename
parsing needs are Shift JIS and Big 5 on Windows, where you *do* have
that absolutely reliable alternative.  (Users who encode file names to
Shift JIS or ISO-2022-JP on POSIX file systems deserve what they get,
and Emacs is by far not the only executioner.  POSIX specifies that
directory entry names are byte sequences, so all apps that use file
names are susceptible to these bugs.)

The right thing to do in some sense is to have an "external file name
type" which stores both the Emacs string name and (if the name was
received as bytes from outside) a representation of those bytes.
Rather than change the Lisp_String structure, I would recommend
putting a property (`text-as-received', `externally-coded-text', or
whatever) on the string.  The content of the property would be the
filename decoded as 'binary (or perhaps using Emacs's
undecodable-bytes representation).

Although Emacs doesn't seem to have string properties (ie, on the
object), one can put a text property on the string (or use an overlay,
which might work for the degenerate case of a 0-length string).  This
would allow callers (and sufficiently Type A users) to retry decoding
with a different encoding.

Of course this requires rather smart callers if they slice-n-dice the
file name.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]