bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16544: Optimazation for is_mb_middle


From: Jim Meyering
Subject: bug#16544: Optimazation for is_mb_middle
Date: Sun, 26 Jan 2014 15:52:22 -0800

On Fri, Jan 24, 2014 at 6:39 PM, Norihiro Tanaka <address@hidden> wrote:
> Package: grep
> Tags: patch
>
> When matched characters to a regular expression is found by kwsexec or
> dfaexec, we need check whether it is in the middle of a multi-byte character.
>
> `is_mb_middle' of searchutils.c is used for it. However, it's expensive,
> even if most of them contain constitute with single-byte characters.
> For example, a source code written in a language with multibyte characters,
> has a lot of single-byte characters.
>
> Now, I post the patch which optimizes `is_mb_middle'. It checks whether
> each single-byte is completion as an character before execution, and
> caches them. In addition, for UTF-8 further optimization is performed.
>
> Only when it's impossible to determine the length of a multibyte character
> with caches, the length is determined with `mbrlen' in execution.

Thank you for the patch.
If you performed profiling that led you to optimize that function,
would you please share that data? In any case, can you give
examples showing how much of a difference this change makes?

Also, when I applied that patch and attempted to compile, gcc printed
an error and two new warnings (see attached).  Would you please
address those and send a new version?

Attachment: k.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]