bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strcasecmp in regex


From: Bruno Haible
Subject: Re: strcasecmp in regex
Date: Tue, 26 Jun 2012 23:59:41 +0200
User-agent: KMail/4.7.4 (Linux/3.1.10-1.9-desktop; KDE/4.7.4; x86_64; ; )

Hi Paul,

> Shouldn't regex be avoiding strcasecmp entirely?
> That is, couldn't there be a weird locale that considers
> the lower-case equivalent of "U" to be "uu", or something
> weird like that?

In such a locale, strcasecmp would not consider "U" and "uu" as
being equivalent; only mbscasecmp would do this.

But you're right: for comparing results of nl_langinfo (CODESET),
one should not use a locale dependent comparison. You wouldn't
want "ISO-8859-9" and "iso-8859-9" to be considered as different,
just because the locale is Turkish.

> For this particular case c-strcase seems overkill, so how
> about the following further patch?
> 
> diff --git a/lib/regcomp.c b/lib/regcomp.c
> index 7eb003b..6d5525a 100644
> --- a/lib/regcomp.c
> +++ b/lib/regcomp.c
> @@ -899,8 +899,10 @@ init_dfa (re_dfa_t *dfa, size_t pat_len)
>                      != 0);
>  #else
>    codeset_name = nl_langinfo (CODESET);
> -  if (strcasecmp (codeset_name, "UTF-8") == 0
> -      || strcasecmp (codeset_name, "UTF8") == 0)
> +  if ((codeset_name[0] == 'U' || codeset_name[0] == 'u')
> +      && (codeset_name[1] == 'T' || codeset_name[1] == 't')
> +      && (codeset_name[2] == 'F' || codeset_name[2] == 'f')
> +      && strcmp (codeset_name + 3 + (codeset_name[3] == '-'), "8") == 0)
>      dfa->is_utf8 = 1;
>  
>    /* We check exhaustively in the loop below if this charset is a
> diff --git a/modules/regex b/modules/regex
> index 5371bab..cfc5d07 100644
> --- a/modules/regex
> +++ b/modules/regex
> @@ -26,7 +26,6 @@ mbsinit         [test $ac_use_included_regex = yes]
>  nl_langinfo     [test $ac_use_included_regex = yes]
>  stdbool         [test $ac_use_included_regex = yes]
>  stdint          [test $ac_use_included_regex = yes]
> -strcase         [test $ac_use_included_regex = yes]
>  wchar           [test $ac_use_included_regex = yes]
>  wcrtomb         [test $ac_use_included_regex = yes]
>  wctype-h        [test $ac_use_included_regex = yes]

Looks right to me. Please add to this the removal of <strings.h> from
regex_internal.h, since I had already committed the #include <strings.h>.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]