bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#35939: version sort is incorrect with hyphen-minus


From: Assaf Gordon
Subject: bug#35939: version sort is incorrect with hyphen-minus
Date: Wed, 26 Jun 2019 12:25:26 -0600
User-agent: Mutt/1.11.4 (2019-03-13)

(Adding Ian Jackson for dpkg/debian-version details)

Hello,

On Tue, May 28, 2019 at 02:53:39AM +0200, Vincent Lefevre wrote:
> With GNU coreutils 8.30 under Debian/unstable, I get:
> 
> $ LC_ALL=C ls
> ab-cd  abb  abe
> $ LC_ALL=C ls -v
> abb  abe  ab-cd
> 
> The hyphen-minus character should still be regarded as being less
> than the letters (there are no digits, so both are expected to be
> equivalent). The GNU coreutils manual says:
> 
[...]

Thanks for the report and the clear details.

To summarize,
"ls -v" and "sort -V" (coreutils' version sort) behaves differently than
other implementations in regards to minus character:

    $ printf "%s\n" abb ab-cd | sort -V
    abb
    ab-cd

    $ v1="abb"
    $ v2="ab-cd"
    $ dpkg --compare-versions "$v1" lt "$v2" && printf "$v1\n$v2\n" || printf 
"$v2\n$v1\n"
    ab-cd
    abb

If I understand correctly,
The reason is that in Debian's version comparison algorithm [1], the minus
character has a special meaning: it separates the "upstream version"
part from the "debian revision" part.

In Debian's implementation [2], a version string is first split into three
parts (epoch, upstream version, debian revision) using ":" for epoch
delimiter and "-" for revision delimiter. Only then the three parts are
compared, separately [3].

[1] https://www.debian.org/doc/debian-policy/ch-controlfields.html#version
[2] https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/lib/dpkg/parsehelp.c#n191
[3] https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/lib/dpkg/version.c#n140

On ther other hand, coreutils' implementation (from gnulib [4]) does not
break version string into three parts - it treats the entire string as a
single "upstream version" part.
The rules for sorting the "upstream version" string say:

  "... The lexical comparison is a comparison of ASCII values modified so
  that all the letters sort earlier than all the non-letters and so that a
  tilde sorts before anything" (from [1])

[4] https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/filevercmp.c

Therefore, dpkg first seprates "ab" from "cd", then compares "ab" to
"abb" - and 'ab' comes first;
Coreutils compare "ab-cd" to "abb" (or technically, just "ab-" to
"abb"), and because "letters sort earlier than all non-letters", "abb"
comes first.

I hope this helps explain the differences (I also hope this explanation is
correct, and I invite others to chime in).


regards,
 - assaf






reply via email to

[Prev in Thread] Current Thread [Next in Thread]