[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
uniq prints invalid unique lines multiple times
From: |
Reuben Thomas |
Subject: |
uniq prints invalid unique lines multiple times |
Date: |
Mon, 23 Feb 2004 21:46:27 +0100 (CET) |
When I run uniq from coreutils 5.0 with LANG=en_GB.UTF-8 on a glibc 2.3.2
system on a file which (I think) is not valid UTF-8, I get a confusing
result: the two lines in the file are identical, but uniq prints them
both, and returns an exit code of 0. If I run
LANG=C uniq <file>
I get the expected single line of output. What I expect when I run with
LANG=en_GB.UTF-8 is either for uniq to return an error (because the file
is not valid text), or to print one single line (if it's being lenient).
The only way I might be wrong is if the file can be interpreted as a UTF-8
file with two non-identical lines, but I don't think it can.
I attach the relevant file, and display it below (the \200 is a literal
top-bit-set byte, value octal 0200). The file ends with a linefeed, in
case you're wondering!
ushort f(char,double,char,double):('a',0.2,'\200',0.4)->65506
ushort f(char,double,char,double):('a',0.2,'\200',0.4)->65506
--
http://www.mupsych.org/~rrt/ | The only person worth beating is yourself
minitests.output.i686-pc-linuxlibc6
Description: Text document
- uniq prints invalid unique lines multiple times,
Reuben Thomas <=