[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bugs in dirname module
From: |
Eric Blake |
Subject: |
Re: bugs in dirname module |
Date: |
Mon, 07 Nov 2005 06:42:20 -0700 |
User-agent: |
Mozilla Thunderbird 1.0.2 (Windows/20050317) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Paul Eggert on 11/6/2005 3:59 AM:
>
>>My approach was to add a dependency on the c-ctype module to use c_isalpha
>
> That seems a bit overkill here, since you can assume ASCII and DOS.
> Why not use this instead?
>
> #define c_isalpha(c) (((unsigned int) (c) | ('a' - 'A')) - 'a' <= 'z' - 'a')
Agreed - all platforms with drive letter prefixes are based on ASCII
encodings, so I can resubmit without a dependency on c-ctype. Should I
add a dependency on the verify module to ensure that drive letter prefixes
are only used on ASCII platforms? Should I still move
FILE_SYSTEM_PREFIX_LEN out of config.h and into dirname.h?
>
>>Second, cygwin currently normalizes all drive letters as absolute paths,
>>which is different from djgpp
>
>
> Sigh. This stuff will probably break a lot of code, not just dirname.
> Do we really want to bother with it? Perhaps we should just tell
> users not to mess with names like "c:foo". I assume they're pretty
> rare. Will that simplify the code here and elsewhere?
That was the intent of tests/test-dirname.c - a good listing of all sorts
of problematic filenames, and their division into base_name and dir_name
on the three styles of platforms (unix with no drive letters, cygwin with
drive letters always starting an absolute path, and DOS with drive letters
potentially starting a relative path). I made the code as simple as I
could while making it correct for all three cases.
>
>
>>+ {
>>+ if (prefix_len)
>>+ base = p;
>>+ else if (! DOUBLE_SLASH_IS_DISTINCT_ROOT || 2 < p - base)
>>+ base = p - 1;
>>+ }
>
>
> This code doesn't seem to match the new comments for basename.
Agreed that the comment does not quite match the code. But it is doing
exactly what I intended - this block is only invoked when the filename
consists of just a prefix and a sequence of slashes. If there was a
prefix, it results in the empty string; otherwise, it results in the final
one or two slashes (turning c:/, c://, c:///, etc into ''; turning ///,
////, etc into the final /, leaving / alone, and leaving // alone only if
DOUBLE_SLASH_IS_DISTINCT_ROOT).
> Furthermore, I don't see why the basename of "A:/" should be "/". or
> the basename of "//" should be "//". The point of basename and
> dirname is that if you concatenate them, with a slash between them,
> then you get the same file. But I don't see how this rule generalizes
> to DOS under the proposed code.
Your formulation is not quite correct. POSIX poses the rule as: if
"$string" is a valid filename, then 'cd "$(dirname "$string")"; ls
"$(basename "$string")"' lists the same file (although POSIX fails to
mention that this will fail if the filename contains a trailing newline).
Note that POSIX requires basename("/") to return "/", and basename("//")
to return either "/" or "//". ALL other names subjected to basename()
will never contain a slash. Likewise, POSIX requires dirname("/") to
return "/" and dirname("//") to return "//", and all other names subjected
to dirname() will never contain a trailing slash. On platforms where //
is special, your formulation of dirname("//")/basename("//") results in
"/////", which is certainly not the same as "//". But "cd //; ls //" does
indeed list the same file as the original "ls //".
Now, on applying that to systems with drive letters. If basename("c:/")
returns "/", we have violated the POSIX rule of thumb: "cd c:/; ls /" is
much different than "ls c:/". Yet without my patch, this is what
base_name was doing. Part of the issue here is that POSIX lets
basename("c:/") modify its argument, but that our definition of base_name
is that we can only return a pointer to somewhere inside the passed
argument without modification. My patch makes base_name("c:/") return "".
The other alternative is to treat c:/ like a root, and just like
base_name("/") returns "/", have base_name("c:/") return "c:/". But by
your earlier rule of appending a slash between the dir_name and base_name,
my patch would produce "c://" on DOS, which is still a valid spelling of
c:/, whereas having base_name return c:/ would create the invalid name
"c://c:/".
For that matter, would it help to redefine our base_name to ALWAYS return
the empty string if the argument is a root directory? Then the rule of
recreating a valid filename becomes simpler - concatenate dir_name (less a
trailing slash), a slash, and the base_name. Then for "//" on platforms
where it is special, you get "//" less one slash, /, and "", resulting in
"//". Yes, this is different than POSIX basename(), but we already know
that base_name is not an exact match to POSIX. My only worry about
changing semantics like that is that it may break existing uses of base_name.
Another thing to think about. Currently, strip_trailing_slashes("///")
currently calls base_name("///"), gets a single "/", skips past that
slash, then returns false (leaving the user with "///" still). But it
might be more intuitive if it changed the input string to "/\0/" and
returned true.
>
> Many of the rest of the changes look pretty complicated, and I didn't
> take the time to review them, since I want to know what dirname and
> basename are supposed to do before worrying about the details.
I guess the first thing to review, then, would be my testsuite (which did
pass in all three configurations - unix, cygwin, and DOS). If you can
agree that every one of those proposed filename divisions into a directory
and basename makes sense, then your review becomes a matter of validating
the codepaths to ensure that the testsuite is always passed, and that the
testsuite is comprehensive enough.
I am also working on a followup patch to coreutils that fixes basename(1)
(POSIX requires that "basename // /" return "//" on platforms where // is
special), as well as improving the documentation and testsuite of basename
and dirname.
- --
Life is short - so eat dessert first!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDb1m884KuGfSFAYARApbUAKCnBiXIpLOF4eLc4aknfRSbjJabDwCgz+zu
Fe3X89Qb2GvODfEWZIaPAmI=
=348K
-----END PGP SIGNATURE-----
- bugs in dirname module, Eric Blake, 2005/11/06
- Re: bugs in dirname module, Paul Eggert, 2005/11/06
- Re: bugs in dirname module,
Eric Blake <=
- Re: bugs in dirname module, Eric Blake, 2005/11/08
- Re: bugs in dirname module, Bruno Haible, 2005/11/09
- Re: bugs in dirname module, Paul Eggert, 2005/11/09
- Re: bugs in dirname module, Eric Blake, 2005/11/09
- Re: bugs in dirname module, Paul Eggert, 2005/11/10
- Re: bugs in dirname module, Eric Blake, 2005/11/10
- Re: bugs in dirname module, Paul Eggert, 2005/11/11
- Re: bugs in dirname module, Eric Blake, 2005/11/11