[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#6707: cut is not multi byte (wide char) aware
From: |
Eric Blake |
Subject: |
bug#6707: cut is not multi byte (wide char) aware |
Date: |
Thu, 22 Jul 2010 17:15:39 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100621 Fedora/3.0.5-1.fc13 Lightning/1.0b2pre Mnenhy/0.8.3 Thunderbird/3.0.5 |
On 07/22/2010 12:49 PM, Mihai Moldovan wrote:
> Hi,
>
> I have come to notice that cut is not yet multi byte/wide char aware.
Yes, and so are a lot of the coreutils. This is a well-known issue, and
mentioned in the TODO. Several distros have add-on patches that add
wide char support, but to date, no one has yet submitted a patch
upstream that is both easy to maintain (doesn't needlessly duplicate big
blocks of code over char vs. wchar_t) and which doesn't penalize speed
on single-byte locales. We've got some ideas on what is needed, and
gnulib is certainly getting closer to what we need (Bruno's work on
libunistring will be a key player in an acceptable patch), but it takes
time to pull it all together.
> (Is this even considerable as a bug, or just a "feature" in that only
> one byte delimiters are allowed by default?)
Yes, it can be considered a bug, and any extra help would be welcome.
Unfortunately, to date there has been no one willing to step forward to
scratch this itch as their highest priority.
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature