[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strcasecmp in regex
From: |
Bruno Haible |
Subject: |
Re: strcasecmp in regex |
Date: |
Tue, 26 Jun 2012 23:59:41 +0200 |
User-agent: |
KMail/4.7.4 (Linux/3.1.10-1.9-desktop; KDE/4.7.4; x86_64; ; ) |
Hi Paul,
> Shouldn't regex be avoiding strcasecmp entirely?
> That is, couldn't there be a weird locale that considers
> the lower-case equivalent of "U" to be "uu", or something
> weird like that?
In such a locale, strcasecmp would not consider "U" and "uu" as
being equivalent; only mbscasecmp would do this.
But you're right: for comparing results of nl_langinfo (CODESET),
one should not use a locale dependent comparison. You wouldn't
want "ISO-8859-9" and "iso-8859-9" to be considered as different,
just because the locale is Turkish.
> For this particular case c-strcase seems overkill, so how
> about the following further patch?
>
> diff --git a/lib/regcomp.c b/lib/regcomp.c
> index 7eb003b..6d5525a 100644
> --- a/lib/regcomp.c
> +++ b/lib/regcomp.c
> @@ -899,8 +899,10 @@ init_dfa (re_dfa_t *dfa, size_t pat_len)
> != 0);
> #else
> codeset_name = nl_langinfo (CODESET);
> - if (strcasecmp (codeset_name, "UTF-8") == 0
> - || strcasecmp (codeset_name, "UTF8") == 0)
> + if ((codeset_name[0] == 'U' || codeset_name[0] == 'u')
> + && (codeset_name[1] == 'T' || codeset_name[1] == 't')
> + && (codeset_name[2] == 'F' || codeset_name[2] == 'f')
> + && strcmp (codeset_name + 3 + (codeset_name[3] == '-'), "8") == 0)
> dfa->is_utf8 = 1;
>
> /* We check exhaustively in the loop below if this charset is a
> diff --git a/modules/regex b/modules/regex
> index 5371bab..cfc5d07 100644
> --- a/modules/regex
> +++ b/modules/regex
> @@ -26,7 +26,6 @@ mbsinit [test $ac_use_included_regex = yes]
> nl_langinfo [test $ac_use_included_regex = yes]
> stdbool [test $ac_use_included_regex = yes]
> stdint [test $ac_use_included_regex = yes]
> -strcase [test $ac_use_included_regex = yes]
> wchar [test $ac_use_included_regex = yes]
> wcrtomb [test $ac_use_included_regex = yes]
> wctype-h [test $ac_use_included_regex = yes]
Looks right to me. Please add to this the removal of <strings.h> from
regex_internal.h, since I had already committed the #include <strings.h>.
Bruno