[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why does dired go through extra efforts to avoid unibyte names

From: Stefan Monnier
Subject: Re: Why does dired go through extra efforts to avoid unibyte names
Date: Tue, 02 Jan 2018 23:14:20 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

>> I bumped into the following code in dired-get-filename:
>>        ;; The above `read' will return a unibyte string if FILE
>>        ;; contains eight-bit-control/graphic characters.
>>        (if (and enable-multibyte-characters
>>                 (not (multibyte-string-p file)))
>>            (setq file (string-to-multibyte file)))
>> and I'm wondering why we don't want a unibyte string here.
>> `vc-region-history` told me this comes from the commit appended below,
>> which seems to indicate that we're worried about a subsequent encoding,
>> but AFAIK unibyte file names are not (re)encoded, and passing them
>> through string-to-multibyte would actually make things worse in this
>> respect (since it might cause the kind of (re)encoding this is
>> supposedly trying to avoid).
>> What am I missing?
> Why does it matter whether eight-bit-* characters are encoded one more
> or one less time?

That's part of the question, indeed.

> As for the reason for using string-to-multibyte: maybe it's because we
> use concat further down in the function, which will determine whether
> the result will be unibyte or multibyte according to its own ideas of
> what's TRT?

But `concat` will do a string-to-multibyte for us, if needed, so
that doesn't seem like a good reason.

This said, when that code was written, maybe `concat` used
string-make-multibyte internally instead, so this call to
string-to-multibyte might have been added to avoid using
string-make-multibyte inside `concat`?

It would be good to have a concrete case that needed the above code, to
see if the problem still exists.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]