bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

From:	Norihiro Tanaka
Subject:	bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Date:	Sat, 20 Dec 2014 10:31:46 +0900

On Fri, 19 Dec 2014 23:00:38 +0900
Norihiro Tanaka <address@hidden> wrote:
> I also see it is a bug as you say.  mbrlen() in glibc returns (size_t) -1
> for the sequence.

$ printf "\xED\xA0\xBF\n" | LC_ALL=en_US.utf8 src/grep -G .
Binary file (standard input) matches
$ printf "\xED\xA0\xBF\n" | LC_ALL=en_US.utf8 src/grep -P .
$

regex also behaves same as grep -G, e.g. sed only using regex returns the
line.  Therefore, I think that what a character in the surrogate area
matches a period with grep -G is not a bug, although the behavior might
not obey a standard.

$ printf "\xED\xA0\xBF\n" | LANG=en_US.utf8 sed -ne '/./p'

By the way, mbrlen() returns (size_t) -1 for the character.

OTOH, if a character in the surrogate area does not match a period in
PCRE, I think that the character should not also match a period grep -P.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Vincent Lefevre, 2014/12/18
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Norihiro Tanaka, 2014/12/19
  - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Vincent Lefevre, 2014/12/19
  - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Norihiro Tanaka <=
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Vincent Lefevre, 2014/12/19
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/12/19
    - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Norihiro Tanaka, 2014/12/19
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Vincent Lefevre, 2014/12/19
  - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/12/19
  - bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Norihiro Tanaka, 2014/12/19

Prev by Date: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Next by Date: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Previous by thread: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Next by thread: bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
Index(es):
- Date
- Thread