coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] tr: case mapping anomaly


From: Jim Meyering
Subject: Re: [coreutils] tr: case mapping anomaly
Date: Wed, 29 Sep 2010 08:45:14 +0200

Pádraig Brady wrote:
> On 25/09/10 07:53, Jim Meyering wrote:
>> Eric Blake wrote:
>>> On 09/24/2010 04:47 PM, Pádraig Brady wrote:
>>>> I was just looking at a bug reported to fedora there where this abort()s
>>>>
>>>>   $ LC_ALL=en_US tr '[:upper:] ' '[:lower:]'
>>
>> Ouch!  Thanks for reporting it here.
>> How many more bugs lurk in tr...
>> Consolation: this one is a failure to diagnose invalid inputs.
>
> I found a few more issues:

Nice catches.  Thanks for all the work.

Please mention the above abort-inducing case in the log along with the
other three.  While you're at it, please use LC_ALL= there, not LANG=,
since the latter is a glibc-ism.

> This valid translation spec aborted:
>   LANG=en_US tr '[:upper:]- ' '[:lower:]_'

The above does not abort w/glibc when LC_ALL=C happens to be set.

> This misaligned conversion spec was allowed:
>   LANG=C tr 'A-Y[:lower:]' 'a-z[:upper:]'
> This misaligned spec was allowed by extending the class:
>   LANG=C tr '[:upper:] ' '[:lower:]'
>
...
> +  tr: fix various issues with case conversion classes.  In some locales
> +  valid conversion specifications caused tr to abort, while some invalid
> +  specifications were undiagnosed.
> +  [bugs introduced in coreutils 6.9.90 and 6.9.92]

Two separate bugs!  Thanks for tracking them down.
I'm surprised they are so recent.
I suppose that means I introduced both.  Let's see...
These two seem like the only candidates:

  6efd10462d8103208f4575f0b5edddf841c7d87c
  af5d0c363a52e787a4311a11f035209eecdc4115

Whichever they are, please list them in the commit log.

>  ** New features
>
>    cp now accepts the --attributes-only option to not copy file data,
> diff --git a/src/tr.c b/src/tr.c
> index a5b6810..3adc85f 100644
> --- a/src/tr.c
> +++ b/src/tr.c
> @@ -1177,6 +1177,77 @@ card_of_complement (struct Spec_list *s)
>    return cardinality;
>  }
>
> +/* Discard the lengths associated with a case conversion,
> +   as using the actual number of upper or lower case characters
> +   is problematic when they don't match in some locales.
> +   Also ensure the case conversion classes in string2 are
> +   aligned correctly with those in string1.
> +   Note POSIX says the behavior of `tr "[:upper:]" "[:upper:]"'
> +   is undefined.  Therefore we allow it (unlike Solaris)
> +   and treat it as a no-op.  */
> +
> +static void
> +validate_case_classes (struct Spec_list *s1, struct Spec_list *s2)
> +{
> +  size_t upper_chars = 0;
> +  size_t lower_chars = 0;

Please name these n_upper and n_lower, so each is obviously a counter.
Otherwise, seeing the name out of context might evoke an array.

> +  int i;

Please make this "unsigned int", since it never needs negative values.

> +  int c1=0, c2=0;

These days, I prefer one per line, and with spaces:

    int c1 = 0;
    int c2 = 0;

Hmm..  I see you're following existing style that did "int c1, c2;",
though in existing code, those didn't have initializers.
And the "int i;" was essentially moved.  And there are others.
I blame the author ;-) (me, 18 years ago)

> +  count old_s1_len = s1->length;
> +  count old_s2_len = s2->length;
> +  struct List_element *s1_tail = s1->tail;
> +  struct List_element *s2_tail = s2->tail;
> +  bool s1_new_element = true;
> +  bool s2_new_element = true;
...
> +
> +# Before coreutils 8.6 the disparat number of upper and lower

s/disparat/disparate/

> +# characters in some locales, triggered abort()s and invalid behavior
> +if test "$(LC_ALL=en_US.ISO-8859-1 locale charmap 2>/dev/null)" = ISO-8859-1;
> +then
> +  (

No big deal, but why run this in a subshell?
It doesn't seem necessary here.

> +    export LC_ALL=en_US.ISO-8859-1
> +    tr '[:upper:] ' '[:lower:]' < /dev/null 2>out && _fail=1
> +    echo 'tr: when translating with string1 longer than string2,
> +the latter string must not end with a character class' > exp
> +    compare out exp || _fail=1
> +
> +    # Ensure when there are a different number of elements
> +    # in each string, we validate the case mapping correctly
> +    tr 'ab[:lower:]' '0-1[:upper:]' < /dev/null || _fail=1

It might be nice to ensure that e.g., abc.xyz maps to ABC.XYZ.

> +    # Ensure we extend string2 appropriately
> +    tr '[:upper:]- ' '[:lower:]_' < /dev/null || _fail=1

Similarly, that ABC-XYZ maps to abc_xyz

Thanks again!

> +    # Ensure the size of the case classes are accounted
> +    # for as a unit.
> +    echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' |
> +    tr '[:upper:]A-B' '[:lower:]0' >out ||  _fail=1
> +    echo '00cdefghijklmnopqrstuvwxyz' > exp
> +    compare out exp || _fail=1
...



reply via email to

[Prev in Thread] Current Thread [Next in Thread]