[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: tr '[:upper:]' '[:lower:]' -- misaligned construct
From: |
Jim Meyering |
Subject: |
Re: tr '[:upper:]' '[:lower:]' -- misaligned construct |
Date: |
Mon, 07 Jan 2008 20:15:44 +0100 |
Micah Cowan <address@hidden> wrote:
> Jim Meyering wrote:
>> Here's a tentative patch that also avoids repeated
>> (and wasteful) initialization of the xlate array.
>
> I note that POSIX requires that, in the case that the arguments are
> exactly '[:lower:]' and '[:upper:]' (or the reverse of the same), tr is
> actually supposed to ignore the 'lower' and 'upper' character classes,
> and instead initialize the mapping from the locale's "tolower"/"toupper"
> definition. This would have avoided the length mismatch in the first
> place, and while that issue appears to be addressed, tr still does not
> conform to POSIX, as, if tr were to encounter a locale definition file
> with an LC_CTYPE category definition such as the following:
Thanks for the feedback.
However, you seem to be misinterpreting something.
GNU tr has always initialized its internal translation
array using the tolower and toupper functions.
The problem I mentioned above is that it was performing
the correct initialization repeatedly.
> upper A;...;Z
> lower a;...;z
> tolower (A,Z)
> ...
> This would require
> $ echo AAAA | tr '[:upper:]' '[:lower:]'
> to output "ZZZZ" (though it isn't even lowercased), rather than 'aaaa'.
GNU tr should work properly, even with such an odd locale -- as long
as it's a uni-byte one. See below.
> While the example above is, of course, contrived, there may well be
> locales where the tolower/toupper mappings differ from the longest
> possible mapping between the 'upper' and 'lower' classes.
>
> In fact, as it currently stands, I expect tr mishandles a case such as:
> $ echo σιγμας | tr '[:lower:]' '[:upper:]'
> (Note the two variants of "sigma" in there, which both have a single
> corresponding capital letter; I'm afraid I can't actually verify this is
> broken, as my work desktop is not set up to compile coreutils, and I
> lack the time to correct this for now; the stock (old) tr on the system,
> running Fedora Core 6, silently passes it through without conversion.)
Your example uses multi-byte characters, and that is a separated issue.
Upstream GNU tr does not yet work with multi-byte characters.
If you can make tr misbehave, it'd be great to hear about it soon,
since I'm pretty close to being able to release a stable coreutils-6.10.