Re: regcomp gnulib - glibc sync bears fruit

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regcomp gnulib - glibc sync bears fruit

From:	Jim Meyering
Subject:	Re: regcomp gnulib - glibc sync bears fruit
Date:	Tue, 05 Jan 2010 22:37:27 +0100

Bruno Haible wrote:
> Hi Paolo,
>
>> Before proceeding, however, I'm curious whether using nl_langinfo
>> (CODESET) is less precise than locale_charset on some platform.  Bruno?
>
> Here's my reply to Jim from yesterday. For some reason it was apparently
> not distributed to the mailing list.
>
> Hi Jim,
>
>> @@ -893,7 +896,9 @@ init_dfa (re_dfa_t *dfa, size_t pat_len)
>> dfa->map_notascii = (_NL_CURRENT_WORD (LC_CTYPE, _NL_CTYPE_MAP_TO_NONASCII) 
>> != 0);
>> #else
>> - if (strcmp (locale_charset (), "UTF-8") == 0)
>> + codeset_name = nl_langinfo (CODESET);
>> + if (strcasecmp (codeset_name, "UTF-8") == 0
>> + || strcasecmp (codeset_name, "UTF8") == 0)
>> dfa->is_utf8 = 1;
>>
>> /* We check exhaustively in the loop below if this charset is a
>
> This patch is not wrong: It takes care of the fact that the result
> of nl_langinfo(CODESET) can be in upper case or in lower case,
> depending on the system, and that on HP-UX, "utf8" is returned
> (see lib/config.charset).
>
> But I would nevertheless not apply it nor recommend it, because

By "it" I presume you mean only the remainder:
the use nl_langinfo-unconditionally part.

> the nl_langinfo module may include a lot more stuff in the future:
>   - It may include real localizations of the values, instead of returning
>     English dummy values. I have already written the converter from
>     glibc locale data to PO files that can be read by the nl_langinfo
>     replacement.
>   - It may include an emulation of the NL_LOCALE_NAME(category)
>     macro that works since glibc 2.11.1. This emulation would rely on
>     the 'localename' module.
> When all you need is nl_langinfo(CODESET), the full-blown 'nl_langinfo'
> module is too heavyweight. 'localcharset' is not a POSIX API, but fits
> better due to the gnulib module structure.

At first, I was thinking of reverting most of that patch,
and even wrote the log and committed (locally) the result.[1]
But then I thought "What systems would suffer if I didn't?"
Only, shall we say, "challenged" systems would suffer the added
overhead.  Overhead that is as yet only theoretical/planned and
of course unmeasured.

In general, gnulib strives to use the "right" APIs, and if
that imposes some overhead on older or less-functional systems,
that's not a problem.  Making the code maintainable for everyone
is more important than sacrificing even a small amount of readability
or maintainability for some unspecified gains.  Of course, if using
nl_langinfo imposes some inordinate overhead on important systems,
then that should be addressed.

Bruno would you object to waiting until there are measurements
that show there is enough overhead to warrant reverting the change?
The exposure of using nl_langinfo here has already shaken out one bug...

[1]  Here's the patch I wrote (not to be applied now).
Posting in case it will come in handy later.

>From da3f9dc64db294506c53c244e36800b0f4d958d3 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 5 Jan 2010 22:17:43 +0100
Subject: [PATCH] regcomp: revert most of 
0cfc3b87f0c3be63db1075ed465443c4b3c4cec2

... on advice from Bruno Haible that gnulib's nl_langinfo may
end up being relatively heavy-weight compared to the existing,
localcharset-based work-around.
* lib/regcomp.c (init_dfa) [!LIBC]: Do *not* always use
nl_langinfo (CODESET).
* modules/regex (Depends-on): Remove nl_langinfo.
---
 ChangeLog     |   10 ++++++++++
 lib/regcomp.c |    3 ---
 modules/regex |    1 -
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 80ba5cd..0d5084d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2010-01-05  Jim Meyering  <address@hidden>
+
+       regcomp: revert most of 0cfc3b87f0c3be63db1075ed465443c4b3c4cec2
+       ... on advice from Bruno Haible that gnulib's nl_langinfo may
+       end up being relatively heavy-weight compared to the existing,
+       localcharset-based work-around.
+       * lib/regcomp.c (init_dfa) [!LIBC]: Do *not* always use
+       nl_langinfo (CODESET).
+       * modules/regex (Depends-on): Remove nl_langinfo.
+
 2010-01-05  Aurelien Jarno  <address@hidden>

        utimens (fdutimens): ignore a negative FD, per contract
diff --git a/lib/regcomp.c b/lib/regcomp.c
index ebb696a..802f4a8 100644
--- a/lib/regcomp.c
+++ b/lib/regcomp.c
@@ -850,9 +850,6 @@ static reg_errcode_t
 init_dfa (re_dfa_t *dfa, size_t pat_len)
 {
   __re_size_t table_size;
-#ifndef _LIBC
-  char *codeset_name;
-#endif
 #ifdef RE_ENABLE_I18N
   size_t max_i18n_object_size = MAX (sizeof (wchar_t), sizeof (wctype_t));
 #else
diff --git a/modules/regex b/modules/regex
index f516406..c6a1235 100644
--- a/modules/regex
+++ b/modules/regex
@@ -22,7 +22,6 @@ memcmp
 memmove
 mbrtowc
 mbsinit
-nl_langinfo
 stdbool
 stdint
 ssize_t
--
1.6.6.387.g2649b1

[Prev in Thread]

Current Thread

[Next in Thread]

Re: regcomp gnulib - glibc sync bears fruit, Bruno Haible, 2010/01/05
- Re: regcomp gnulib - glibc sync bears fruit, Jim Meyering <=
  - Re: regcomp gnulib - glibc sync bears fruit, Paolo Bonzini, 2010/01/06
  - Re: regcomp gnulib - glibc sync bears fruit, Bruno Haible, 2010/01/07

Prev by Date: strcasestr
Next by Date: Re: [Bug 552320] Handling AT_FDCWD as EBADF in glibc's utimens() causes tar failures
Previous by thread: Re: regcomp gnulib - glibc sync bears fruit
Next by thread: Re: regcomp gnulib - glibc sync bears fruit
Index(es):
- Date
- Thread