Re: [PATCH 3/3] dfa: optimize UTF-8 period

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] dfa: optimize UTF-8 period

From:	Paolo Bonzini
Subject:	Re: [PATCH 3/3] dfa: optimize UTF-8 period
Date:	Mon, 19 Apr 2010 14:06:00 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100301 Fedora/3.0.3-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.3

On 04/19/2010 10:07 AM, Jim Meyering wrote:

Paolo Bonzini wrote:

On 04/17/2010 09:27 AM, Jim Meyering wrote:

Paolo Bonzini wrote:

* NEWS: Document improvement.
* src/dfa.c (struct dfa): Add utf8_anychar_classes.
(add_utf8_anychar): New.
(atom): Simplify if/else nesting.  Call add_utf8_anychar for ANYCHAR
in UTF-8 locales.
(dfaoptimize): Abort on ANYCHAR.
---
   NEWS      |    6 ++++++
   src/dfa.c |   46 +++++++++++++++++++++++++++++++++++++++++++---
   2 files changed, 49 insertions(+), 3 deletions(-)


Only quick superficial feedback for now:


I pushed all patches but this.


Thanks!

Would you please add comments describing this one in more detail?
I ran out of time trying to understand how it works.


Something like this?

 /* For UTF-8 expand the period to a series of CSETs that define a valid
    UTF-8 character.  This avoids using the slow multibyte path.  I'm
    pretty sure it would be both profitable and correct to do it for
    any encoding; however, the optimization must be done manually as
    it is done above in add_utf8_anychar.  So, let's start with
    UTF-8: it is the most used, and the structure of the encoding
    makes the correctness more obvious.  */

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 1/3] dfa: simplify dfainit, Paolo Bonzini, 2010/04/16
- [PATCH 2/3] dfa: fix {0,0}, Paolo Bonzini, 2010/04/16
  - Re: [PATCH 2/3] dfa: fix {0,0}, Jim Meyering, 2010/04/17
    - Re: [PATCH 2/3] dfa: fix {0,0}, Paolo Bonzini, 2010/04/18
    - Re: [PATCH 2/3] dfa: fix {0,0}, Jim Meyering, 2010/04/19
- [PATCH 3/3] dfa: optimize UTF-8 period, Paolo Bonzini, 2010/04/16
  - Re: [PATCH 3/3] dfa: optimize UTF-8 period, Jim Meyering, 2010/04/17
    - Re: [PATCH 3/3] dfa: optimize UTF-8 period, Paolo Bonzini, 2010/04/19
    - Re: [PATCH 3/3] dfa: optimize UTF-8 period, Jim Meyering, 2010/04/19
    - Re: [PATCH 3/3] dfa: optimize UTF-8 period, Paolo Bonzini <=
    - Re: [PATCH 3/3] dfa: optimize UTF-8 period, Jim Meyering, 2010/04/19
- Re: [PATCH 1/3] dfa: simplify dfainit, Jim Meyering, 2010/04/17

Prev by Date: Re: removing blank lines: "grep ." is really slow
Next by Date: [PATCH v2] dfa: optimize UTF-8 period
Previous by thread: Re: [PATCH 3/3] dfa: optimize UTF-8 period
Next by thread: Re: [PATCH 3/3] dfa: optimize UTF-8 period
Index(es):
- Date
- Thread