bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7159: 24.0.50; (1) `file-name-(non)directory': bad return values, (2


From: Drew Adams
Subject: bug#7159: 24.0.50; (1) `file-name-(non)directory': bad return values, (2) `directory-sep-char'
Date: Mon, 4 Oct 2010 15:22:26 -0700

> > (file-name-directory    titi) ; gives "c:/foo/bar/b[^^@]*\\.el\\"
> > (file-name-nondirectory titi) ; gives "'"
> >  
> > These functions should know how to parse titi to produce 
> > "c:/foo/bar/" and "b[^^@]*\\.el\\'", respectively (where ^@
> > is the control char).
> 
> You are forgetting the backslashes that wildcard-to-regexp inserted.

It should be obvious that I am NOT forgetting such backslashes.

> On DOS and Windows, Emacs treats backslashes as directory separators,
> as you'd expect.  So "c:/foo/bar/b[^^@]*\\.el\\" looks like a leading
> directory of a file whose basename is "'".

No.

Well, let me put it another way: That is just what this bug report is about:
Backslashes are NOT directory separators for Emacs - or at least they should not
be.  Even on Windows.  This bug report says we should get rid of any such
vestigial treatment.

As the doc string for `directory-sep-char' indicates, ?/ is the only directory
separator for Emacs - or at least it should be.

 "Directory separator character for built-in functions that
  return file names.  The value is always ?/."

It says "return" rather than "accept or return" because such functions don't yet
DTRT wrt input.  (And it says "built-in" rather than "standard", which would be
better.)

> In other words, don't pass a regexp with backslashes to these
> functions, because you won't get what you think you will.

Correction: You won't get what you should get, which is just the directory or
non-directory portion of the name, respecting ?/ as the only separator.  And
it's not just about regexps - I used that as an example of a name that included
a backslash.

The point was that file-name decomposition functions should pay no attention to
backslashes.  There is no reason they should consider ?\ to be a directory
separator.

After fixing this we will also be able to remove this parenthetical phrase in
the Elisp manual: "(backslash is also allowed in input on MS-DOS or
MS-Windows)".  This is the _only_ (whispered, parenthetical) mention of such a
vestigial crutch.

> > So I suspect that the `file-name-nondirectory' part of this bug
> > is at least in part a Windows problem.  The code seems to be
> > interpreting the backslash (?\) near the end as a directory
> > separator.
> 
> It does, by design.

Bad design, if so.  More likely it is a vestige.  Perhaps it seemed like the
best or the only possible thing to do at the time, but it is not TRT.

> > If so, that is definitely wrong.  Even on Windows, the
> > code should use the value of `directory-sep-char', which is ?/,
> > not ?\.
> 
> On Windows, we support both, and we always will.  Anything else means
> a terrible breakage, believe me.  For example, it would be very hard
> to parse output of programs that emit file name with backslashes.

Parsing output of programs is something altogether different.  You should not
throw that in here.  Emacs standard functions for decomposing file names should
not be tainted with a eye to parsing arbitrary Windows program output.

That is a completely different requirement and should be handled, naturally, by
special-purpose code (i.e. at a different level) - code that knows just what to
expect from those particular programs.

We can have code in Emacs that parses many different kinds of output, including
Windows file names.  But the need for such special-purpose parsing code is
unrelated to general, standard functions that expect a file name.  In Emacs,
such functions should not treat backslashes as directory separators.

There is no need for that.  Why?  Because ?/ as dir separator works fine for
Emacs code even in Windows.  And because ?/ works always, we should use ONLY ?/.

What is the real requirement to support also ?\?  Please don't say that it is
handling the output from some Windows programs - that is a red herring.

Note that this is very different from the path-separator (":" for Windows, ";"
for UNIX).  In that case, ";" does NOT work for Emacs on Windows - there is no
canonical separator.  But for directories, ?/ _always works_, and it should
therefore be the only char recognized as a dir separator.

For general file-name functions, that is.  Nothing prevents some specialized
Windows parsing code from processing Windows file names that use ?\ (e.g.
creating a file name that uses the standard separator, which can then be handled
in the standard way).

> With the current setup, this is seamless, 

Well, it's apparently been hard-coded here and there to such an extent that you
are screaming that there would be a lot to change to clean it up.  That in
itself is a hefty price for such "seamlessness".

But the real price is the loss of simple standard functions for manipulating
file names correctly.  By pushing special-purpose parsing into the code
everywhere you might think things have been made "seamless", but in fact a muddy
mess has been created.

Emacs's handling of \? in a file name output by an external program should
proceed in two stages: (1) translation to an Emacs file name (if needed), which
means using ?/ as separator, then (2) handling of the Emacs file name using the
standard file-name functions (e.g. `file-name-directory').  That's the clean way
to handle such special-casing.  (And any such use of special-case parsing should
be the exception, not the rule.)

> even if the file names use mixed forward- and back-slashes (yes, it
> happens with GCC and GDB, for example, or even with Make sometimes).

Again, there is nothing wrong with having specialized code that handles such
cases on an individual basis, if they require it.  But the general file-name
handling code of Emacs should handle _Emacs_ file names, which use only ?/ as
the dir separator.

You are muddying the waters by throwing in lots of other stuff here.  Of
_course_ it can happen that some program might need to parse special syntax -
any special syntax.  But this is about the normal Emacs syntax for file names.
And for that syntax the Emacs directory separator is ?/.

If some particular Emacs code is forced by some other code (e.g. GDB) to digest
a name that uses both ?/ and ?\ as directory separators (quelle horreur), then
appropriate Emacs code can be used to fix such names before Emacs tries to deal
with them using the standard file-name functions (e.g. `file-name-directory').

IOW, tack a translation mapping onto the output of GDB or Make or whatever to
standardize such bastard file names (w/ mixed separators).  That can be done by
Emacs, but we should not foul the standard Emacs file-name handling with such
considerations.

"Seamless", indeed.  Putting special-case handling throughout the code doesn't
make things seamless; it makes them quite seamy.

> > However, I see from the doc string that `directory-sep-char' has
> > been made obsolete:
> 
> In fact, just yesterday it was removed altogether, because it has not
> effect on what Emacs does.  That's been like that for years, and we
> saw no complains.

The complaint/suggestion wrt `directory-sep-char' is only that it should be a
constant.  We should not be advising people to hard-code ?/, but rather to use a
constant with a name that proclaims what it is and with a value of ?/.  But this
is only a minor, stylistic concern.  It is not directly related to this bug.

> I'm closing this bug.

I'm reopening it.  To me, this is broken, and this dysfunction is not an
inevitable price to be paid because GDB or whatever outputs Windows file names
using backslashes.  That argument is a copout.

The simple functions `file-name-directory' and `file-name-nondirectory' should
be robust enough to just remove the non-directory and directory portion -
always.  That should be so irregardless of the presence of backslashes.

Those functions are broken on Windows when backslashes are present.  If you
don't want to fix this bug, fine; maybe someone else will someday.

Maybe you don't want to make the effort required to remove such ad-hoc backslash
handling here and there from the Windows Emacs code, but maybe someone else will
someday.  I believe you that the effort might be great, and I accept that
therefore this cannot be a high priority now (there are _many_ outstanding
bugs).  But that does not mean that we currently handle Windows file names
correctly.  That we choose not to fix something now does not imply that it
doesn't need fixing.

Your reason, "Anything else means a terrible breakage, believe me" suggests that
the fix is non-trivial because there is (apparently) lots of code here and there
that still special-cases backslashes, on Windows.  Your example of such
breakage, "to parse output of programs that emit file name with backslashes",
suggests that you do not distinguish parsing Windows program output from Emacs's
general-purpose file-name handling functions.

It is not right to mess up general-purpose file-name-handling functions just for
the benefit of some special-purpose Windows-output parsing here and there.
Write Windows-output specific code to do that according to the particular case
(need), and make the general-purpose file-name handling functions do as they
logically should: recognize ?/ as the only directory separator.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]