--- Begin Message ---
Subject: |
uniq mis-handles UTF8 (8bit) characters |
Date: |
Mon, 16 Dec 2013 15:50:15 +0200 |
Lines with CJK letters are deemed equal by length only, since the characters seem to be ignored.
I understand this is due to locale.
But, it would be nice if a simple flag would do a locale-free comparison (i.e. equal = all bytes are equal).
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#16168: uniq mis-handles UTF8 (8bit) characters |
Date: |
Mon, 16 Dec 2013 17:33:23 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
tag 16168 notabug
close 16168
stop
On 12/16/2013 01:50 PM, Shlomo Urbach wrote:
> Lines with CJK letters are deemed equal by length only, since the
> characters seem to be ignored.
> I understand this is due to locale.
> But, it would be nice if a simple flag would do a locale-free comparison
> (i.e. equal = all bytes are equal).
If you want to compare byte by byte:
LC_ALL=C uniq ....
thanks,
PĒ½draig.
--- End Message ---