bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16544: Optimazation for is_mb_middle


From: Norihiro Tanaka
Subject: bug#16544: Optimazation for is_mb_middle
Date: Sat, 25 Jan 2014 11:39:11 +0900

Package: grep
Tags: patch

When matched characters to a regular expression is found by kwsexec or
dfaexec, we need check whether it is in the middle of a multi-byte character.

`is_mb_middle' of searchutils.c is used for it. However, it's expensive,
even if most of them contain constitute with single-byte characters.
For example, a source code written in a language with multibyte characters,
has a lot of single-byte characters.

Now, I post the patch which optimizes `is_mb_middle'. It checks whether
each single-byte is completion as an character before execution, and
caches them. In addition, for UTF-8 further optimization is performed.

Only when it's impossible to determine the length of a multibyte character
with caches, the length is determined with `mbrlen' in execution.

Attachment: is_mb_middle.txt
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]