[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#6327: sort fails on some UTF-8 input
From: |
Eric Blake |
Subject: |
bug#6327: sort fails on some UTF-8 input |
Date: |
Wed, 02 Jun 2010 08:40:19 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-3.fc13 Lightning/1.0b2pre Mnenhy/0.8.2 Thunderbird/3.0.4 |
[adding gnulib]
On 06/01/2010 10:51 PM, River Tarnell wrote:
> I'm using coreutils 8.5 on Solaris 10.
>
> GNU 'sort' fails to sort some input, while Solaris 'sort' handles it
> correctly:
>
> willow% /opt/ts/gnu/bin/sort sort_test.txt
> /opt/ts/gnu/bin/sort: string comparison failed: Illegal byte sequence
> /opt/ts/gnu/bin/sort: Set LC_ALL='C' to work around the problem.
> /opt/ts/gnu/bin/sort: The strings compared were
> `\360\222\203\276\360\222\205\226' and
> `\360\222\200\255\360\222\213\253\360\222\213\253\360\222\200\255'.
Thanks for the report. What locale are you using (that is, the entire
output of 'locale')? I could not reproduce failure using:
$ export LC_ALL; for f in $(locale -a); do LC_ALL=$f || continue;
sort sort_test.txt >/dev/null || { echo $f; break; }; done
on a GNU/Linux system with 732 installed locales. But it is highly
likely that you could be in a non-UTF-8 locale, or that the Solaris
multibyte functions are not as robust as glibc at detecting valid UTF-8
sequences. If it is indeed a bug in Solaris strcoll(), then gnulib can
probably be taught to work around it.
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature