bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16927: [PATCH] grep: avoid to add same character to a bracket expres


From: Norihiro Tanaka
Subject: bug#16927: [PATCH] grep: avoid to add same character to a bracket expression
Date: Mon, 03 Mar 2014 22:13:00 +0900

Package: grep
Tags: patch

The patch avoids to add same character to a bracket expression in
trivial_case_ignore.  That may be able to generate smaller tokens in
multibyte locales.

For example, FULLWIDTH LATIN CAPITAL LETTER A (ef bd 81) will transform
as below, because multibyte characters in CSET is extended to OR
expressions in DFA.

Before the patch:

[AAa] (where each charactecter is fullwidth)
EF BD CAT 81 CAT EF BD CAT 81 CAT OR EF BC CAT A1 CAT OR

After the patch:

[Aa] (where each charactecter is fullwidth)
EF BD CAT 81 CAT EF BC CAT A1 CAT OR

Attachment: patch.txt
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]