bug#17189: Sort bug #2

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17189: Sort bug #2

From:	Nikos Balkanas
Subject:	bug#17189: Sort bug #2
Date:	Sun, 6 Apr 2014 00:09:13 +0300

On Sat, Apr 5, 2014 at 11:37 PM, Bob Proulx <address@hidden> wrote:

> Nikos Balkanas wrote:
> > Thank you all. As I explained in my previous mail, an update of the man
> > pages is essential. A change in the UI would also be desirable,
> > if the standards allow it. Sorry, about my attitude, but I was getting
> > pretty desperate. Thanks for not flaming.
> >
> > To make it up I will look into updating the man pages ;-)
>
> Hopefully you will then see the WARNING section in the man page.
>
>    ***  WARNING  ***  The locale specified by the environment affects
>    sort order.  Set LC_ALL=C to get the traditional sort order that
>    uses native byte values.
>

Or maybe move it to the top. Where it is, after a nearly incomprehensible
KEYDEF section,
one assumes it has smt to do with KEYDEF :-(
Some examples for KEYDEF would also be nice, since the sites I searched for
sort use,
all gave the wrong -k1.

[...snip...]

 US-ASCII is a subset of UTF-8.  Every ASCII file is also a valid UTF-8
> file.  That is by design.  But it also makes it impossible to make an
> assumption like this.
>
> For example one would start out with:
>
>   Lorem ipsum dolor sit amet
>   Now is the time.
>   Don't look Ethyl!
>
> That file would sort one way.  Then someone would change the
> apostrophe to the unicode one.
>
>   Lorem ipsum dolor sit amet
>   Now is the time.
>   Don't look Ethel!
>
> If sort tried to automatically detect behavior based upon the file
> content then now the file will sort with dictionary sort ordering?  I
> think this would cause a large number of complaints.  It would be data
> dependent behavior and would break a lot of things.  Plus this would
> require sort to add another pass to read the file first to determine
> this before applying sorting it.  Please no.
>
> Besides...  One person's file of human language is another person's
> file of raw bytes.  Can't make assumptions like this.
>

Not assumptions. Facts. If coming across a UTF-8 char you know.
This simple logic is inescapable:
"Sorting input should be based on the input's locale not the system's"

The rest are implementation details. Usually answer would be in the first
few lines. Worst case scenario would be to scan the whole input.
You obviously have considered it a lot in the past, and don't expect to
solve it in this thread.
Maybe i can think of smt if you tell me your sorting algorithm. Using qsort?

>
> Bob
>

[Prev in Thread]

Current Thread

[Next in Thread]

bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/05
- bug#17189: Sort bug #2, Eric Blake, 2014/04/05
  - bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/05
    - bug#17189: Sort bug #2, Bob Proulx, 2014/04/05
    - bug#17189: Sort bug #2, Nikos Balkanas <=
    - bug#17189: Sort bug #2, Eric Blake, 2014/04/07
    - bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/07
    - bug#17189: Sort bug #2, Eric Blake, 2014/04/07
    - bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/07
    - bug#17189: Sort bug #2, Eric Blake, 2014/04/07
    - bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/09
    - bug#17189: Sort bug #2, Eric Blake, 2014/04/09
    - bug#17189: Sort bug #2, Leslie S Satenstein, 2014/04/07
    - bug#17189: Sort bug #2, Nikos Balkanas, 2014/04/08

Prev by Date: bug#17188: Sort bugs
Next by Date: bug#17196: UTF-8 printf string formating problem
Previous by thread: bug#17189: Sort bug #2
Next by thread: bug#17189: Sort bug #2
Index(es):
- Date
- Thread