bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15260: cannot build in a directory with non-ascii characters


From: Eli Zaretskii
Subject: bug#15260: cannot build in a directory with non-ascii characters
Date: Mon, 28 Oct 2013 18:47:32 +0200

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rgm@gnu.org,  handa@gnu.org,  15260@debbugs.gnu.org
> Date: Mon, 28 Oct 2013 00:05:32 -0400
> 
> More specifically, for the bug to appear, you need ENCODE (DECODE (s))
> to not be the identity function.  Why is not so in the "early" Emacs?

Because life's a mess that doesn't easily fit into simple and elegant
schemes ;-)

For starters, we don't really DECODE_FILE with these file- and
directory-names.  We just use build_string or make_string, as you can
easily see in the init_* functions I mentioned.  If you are lucky and
your file names are UTF-8 encoded, this produces the same result as
DECODE_FILE.  If you are less lucky, and your file names are encoded
in something else, like Latin-N, you get a unibyte string with the
same bytes as in the original.  Then we pass these strings to various
functions, like file_accessible_directory_p, that _do_ ENCODE_FILE...
(Luckily, during most of temacs's run, both file-name-coding-system
and its default value are nil, so ENCODE_FILE is a no-op -- except
when they aren't, see the next paragraph.)

Next, it is quite possible that the file-name-coding-system changes
between the time we process and store the file name and the time we
encode and pass it to a low-level function.  This is especially true
during "loadup", when many packages are loaded and their top-level
forms are executed.  It turns out that 2 of them have side effects
that do just that: mule-cmds.el calls reset-language-environment, and
language/english.el calls set-language-info-alist; both have the
effect of resetting default-file-name-coding-system to latin-1 (!? an
interesting "default" for a Unicode-era Emacs, perhaps Handa-san could
comment why we still do that).  When this happens, your symmetry is
broken, and ENCODE_FILE (DECODE_FILE (f)) is no longer the identity
function.

And then there are other players in this game.  For example,
default-directory, which is used every time we call expand-file-name,
IOW "a lot".  If you look in init_buffer, you will see that the
default-directory of *scratch* is first set to a multibyte
representation of the unibyte string we get from getcwd.  In a
"normal" Emacs session, we promptly fix that in startup.el, after the
call to set-locale-environment initializes all the coding-systems.
But "temacs -l loadup dump" doesn't run startup.el, so we are left
with what init_buffer did, which is a string no file-name API will be
able to grok.

Another example is the use of 'equal' (and 'member', which calls
'equal') to compare file and directory names, and look them up in
lists: as you know, 'equal' will not compare a unibyte and a multibyte
string as equal.  So having a mix of unibyte and multibyte strings in
file names fails some of the code that relies on 'equal', tricking it
into doing wrong things, like deciding that Emacs is _not_ run from
the source tree.

I'm sure there's more to this saga, I'm just half-way through it...





reply via email to

[Prev in Thread] Current Thread [Next in Thread]