bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/5] maint: ensure that MB_CUR_MAX is defined even when !MBS_


From: Paolo Bonzini
Subject: Re: [PATCH 1/5] maint: ensure that MB_CUR_MAX is defined even when !MBS_SUPPORT
Date: Fri, 16 Sep 2011 15:12:37 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0.2) Gecko/20110906 Thunderbird/6.0.2

On 09/16/2011 03:03 PM, address@hidden wrote:
Please remember that dfa.[ch] are shared code with gawk and I think
also gettext (although I don't know how up to date gettext's version is).

I'd really prefer not to have too many GREP_xxx kinds of things in those
files.  (It's ok in the rest of grep, of course.:-)

We could separate the variables for dfa and the rest of grep. Grep just needs "#define DFA_MB_CUR_MAX GREP_MB_CUR_MAX" then (and you can similarly "#define DFA_MB_CUR_MAX gawk_mb_cur_max" in gawk).

For what it's worth, MB_CUR_MAX is a function call in GLIBC. There were
some cases in gawk where I was losing a noticable amount of time calling
it a lot.  So I set up a global variable gawk_mb_cur_max and initialize
it in main(), since the result should never change during a single run of
the program.  It made a difference.

Interesting. We do have a field for mb_cur_max in dfaexec, but it is there because some UTF-8 regex can be run as if the locale was single byte. I suspect however that awk programs (especially badly written ones!) do more regex compilation than grep, up to 1 compilation per match. For grep it shouldn't really matter.

Having variables grep_mb_cur_max and dfa_mb_cur_max (separate for the reasons Arnold explained) would work, but it would make it impossible for the compiler to throw away the multibyte code when MBS_SUPPORT is zero.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]