--- Begin Message ---
Subject: |
【Bug】bug report of GNU grep |
Date: |
Sun, 24 Apr 2016 22:04:45 +0800 (CST) |
Hi all,
Suppose the file content is as below:
abc.h
hello world
the output of grep "*.h" file and grep -E "*.h file" are different, from my understanding, they should be the same, '*' is a regular _expression_ meta-character. the output should both be abc.h.
Please help clarifying this issue!
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#23361: 【Bug】bug report of GNU grep |
Date: |
Mon, 25 Apr 2016 09:29:01 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
tag 23361 notabug
thanks
On 04/24/2016 08:04 AM, 谢敬锋 wrote:
>
> Hi all,
>
> Suppose the file content is as below:
>
> abc.h
>
> hello world
>
>
>
>
> the output of grep "*.h" file and grep -E "*.h file" are different,
Correct, and this is not a bug.
POSIX defines two different flavors of regular expressions: basic (when
you use 'grep' without -E) and extended ('grep -E'):
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
> '*' is a regular expression meta-character.
But when it appears as the first character of a regular expression, it
had different meanings. Read what POSIX says:
For a BRE, in 9.3.3:
"
*
The <asterisk> shall be special except when used:
In a bracket expression
As the first character of an entire BRE (after an initial '^',
if any)
"
which means that as written,
grep "*.h" file
is looking for a LITERAL star character followed by the '.'
metacharacter for any character followed by a literal 'h'. Your example
file did not contain that pattern.
For an ERE, in 9.4.3:
"
*+?{
The <asterisk>, <plus-sign>, <question-mark>, and <left-brace> shall
be special except when used in a bracket expression (see RE Bracket
Expression). Any of the following uses produce undefined results:
If these characters appear first in an ERE, or immediately
following a <vertical-line>, <circumflex>, or <left-parenthesis>
"
which means you have undefined results according to POSIX, and therefore
we can make it mean whatever we want, including ignoring the invalid
"*", and searching for the regular expression ".h" instead. Which
explains why:
grep -E "*.h" file
has a match, and adding --color shows that the matching portion is the
".h" portion of the "abc.h" line.
>
> Please help clarifying this issue!
Maybe you are confusing globs (where "*.h" matches "abc.h" because the
'.' is a literal character, and the "*" means "one or more characters")
with regular expressions (where "." means "any character", and "*" means
"zero or more repetitions of the previous regex construct, unless there
is no previous regex construct, in which case it is well-defined for BRE
but undefined for ERE").
At any rate, this is not a bug in grep, so I'm closing the bug report.
But feel free to add further comments or questions on this thread.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
--- End Message ---