bug#17376: [PATCH] grep: fix the different behaviour for a invalid seque

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17376: [PATCH] grep: fix the different behaviour for a invalid seque

From:	Paul Eggert
Subject:	bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA
Date:	Wed, 30 Apr 2014 12:04:50 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 04/30/2014 08:02 AM, Norihiro Tanaka wrote:

Thare is different behaviour for a invalid sequence between KWset and DFA.

   encode() { echo "$1" | tr ABC '\357\274\241'; }
   encode ABC | env LC_ALL=en_US.utf8 src/grep "$(encode A)\|q"
   encode ABC | env LC_ALL=en_US.utf8 src/grep -F "$(encode A)"
   encode sABC | env LC_ALL=en_US.utf8 src/grep "a$(encode A)\|q"
   encode sABC | env LC_ALL=en_US.utf8 src/grep -F "a$(encode A)"

We expect that all of them are same results, but only 4th returns 1 row.

Sorry, but I am not observing this behavior. With grep 2.18, none ofthe commands output anything. The same is true for the git master.

If the pattern or data have encoding errors, POSIX says grep can dowhatever it likes. As I understand it, in grep 2.18 and the git master,an encoding-error byte in a pattern matches only the same encoding-errorbyte in the data. Does this bug report's patch change behavior, so thatan encoding-error byte in a pattern can match part of a validmultibyte-character in the data? If so, it's not clear to me why theproposed behavior change is helpful -- as a user, I'm not sure I'd wantsuch a match to work. If not, then could you please explain a bit morewhat's going on?

More generally, I don't think users care about encoding-error bytes inpatterns. If it helps simplify the code and/or improves performance,I'd favor changing 'grep' so that it simply rejects patterns containingencoding errors, and exits with status 2. We should probably wait untilafter the next release before doing anything that drastic, though.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA, Norihiro Tanaka, 2014/04/30
- bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA, Paul Eggert <=
  - bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA, Norihiro Tanaka, 2014/04/30

Prev by Date: bug#17377: [PATCH] dfa: optimization of memory allocation
Next by Date: bug#17377: [PATCH] dfa: optimization of memory allocation
Previous by thread: bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA
Next by thread: bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA
Index(es):
- Date
- Thread