|
From: | Paolo Bonzini |
Subject: | Re: [PATCH 3/3] dfa: optimize UTF-8 period |
Date: | Mon, 19 Apr 2010 14:06:00 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100301 Fedora/3.0.3-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.3 |
On 04/19/2010 10:07 AM, Jim Meyering wrote:
Paolo Bonzini wrote:On 04/17/2010 09:27 AM, Jim Meyering wrote:Paolo Bonzini wrote:* NEWS: Document improvement. * src/dfa.c (struct dfa): Add utf8_anychar_classes. (add_utf8_anychar): New. (atom): Simplify if/else nesting. Call add_utf8_anychar for ANYCHAR in UTF-8 locales. (dfaoptimize): Abort on ANYCHAR. --- NEWS | 6 ++++++ src/dfa.c | 46 +++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 49 insertions(+), 3 deletions(-)Only quick superficial feedback for now:I pushed all patches but this.Thanks! Would you please add comments describing this one in more detail? I ran out of time trying to understand how it works.
Something like this? /* For UTF-8 expand the period to a series of CSETs that define a valid UTF-8 character. This avoids using the slow multibyte path. I'm pretty sure it would be both profitable and correct to do it for any encoding; however, the optimization must be done manually as it is done above in add_utf8_anychar. So, let's start with UTF-8: it is the most used, and the structure of the encoding makes the correctness more obvious. */
[Prev in Thread] | Current Thread | [Next in Thread] |