[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 4/9] dfa: speed up handling of brackets
From: |
Jim Meyering |
Subject: |
Re: [PATCH 4/9] dfa: speed up handling of brackets |
Date: |
Wed, 17 Mar 2010 16:21:46 +0100 |
Paolo Bonzini wrote:
...
>> Please use "unsigned int" not just "unsigned".
>
> Fine. I fixed it throughout dfa.[ch], see posted patch. I also
> prefer "unsigned int" but I used the latter since it prevailed in
> those two files.
Thanks.
>>> + if (MB_CUR_MAX> 1)
>>> + {
>>> + REALLOC_IF_NECESSARY(dfa->mbcsets, struct mb_char_classes,
>>> + dfa->mbcsets_alloc, dfa->nmbcsets + 1);
>>> +
>>> + /* dfa->multibyte_prop[] hold the index of dfa->mbcsets.
>>> + We will update dfa->multibyte_prop[] in addtok(), because we can't
>>> + decide the index in dfa->tokens[]. */
>>> +
>>> + /* Initialize work area. */
>>> + work_mbc =&(dfa->mbcsets[dfa->nmbcsets++]);
>>> + work_mbc->nchars = work_mbc->nranges = work_mbc->nch_classes = 0;
>>> + work_mbc->nequivs = work_mbc->ncoll_elems = 0;
>>
>> I was going to write "Please don't put multiple initializations on
>> the same line." but then saw this was merely an indentation change,
>> so never mind.
>
> Indeed; the logical step is to change it to a single memset,
> actually. I did it as a follow-up.
>
>> ACK!
>> Finally. Thanks!
>
> You didn't ack one patch,
Hmm... I certainly reviewed that one, and proposed a replacement
function definition. I guess you mean after you split the use
into a separate commit.
> but it's quite trivial so I pushed it too.
Good.
> The only missing optimization WRT glibc would be to expand UTF-8 "."
> in terms of character sets. I don't plan to do it anytime soon, but
> it should be easy, just special-case ANYCHAR in atom().
>
> Anyway, I think we are now on comparable performance, and considerably
> less bugginess, than most GNU/Linux distros around.
>
> Thanks for the reviews!
I'll let things settle for a little, and make a snapshot tomorrow.
- Re: [PATCH 2/9] dfa: fix handling of ranges in multibyte character sets, (continued)
[PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing, Paolo Bonzini, 2010/03/14
[PATCH 4/9] dfa: speed up handling of brackets, Paolo Bonzini, 2010/03/14
[PATCH 5/9] dfa: optimize simple character sets under UTF-8 charsets, Paolo Bonzini, 2010/03/14
[PATCH 7/9] dfa: run simple UTF-8 regexps as a single-byte character set, Paolo Bonzini, 2010/03/14
[PATCH 6/9] dfa: cache MB_CUR_MAX for dfaexec, Paolo Bonzini, 2010/03/14
[PATCH 8/9] grep: remove check_multibyte_string, fix non-UTF8 missed match, Paolo Bonzini, 2010/03/14