|
From: | Paul Eggert |
Subject: | bug#20657: Traditional range expression not accepted in regex/dfa |
Date: | Mon, 25 May 2015 23:53:31 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 |
address@hidden wrote:
The bugaboo here is the "---"; it's a range expression consisting of minus through minus, and apparently long ago was how one got a minus into a bracket expression.
Actually, long ago expressions like '[^0-9-]' worked just as they do now, and it wasn't ever necessary to use trailing "---". That being said, it is true that in 7th Edition Unix '[^0-9---]' meant the same thing as '[^0-9-]', so in that sense we have an incompatibility with 7th Edition Unix here.
$ ./src/grep '[^0-9---]' /dev/null ./src/grep: Invalid range end The underlying regex and, I believe, dfa routines don't accept this.
Yes, that's correct. It's not a bug, though, as the regexp is ambiguous and does not conform to POSIX, which says the following about RE bracket expressions: "To use a <hyphen> as the starting range point, it shall either come first in the bracket expression or be specified as a collating symbol; for example, "[][.-.]-0]", which matches either a <right-square-bracket> or any character or collating element that collates between <hyphen> and 0, inclusive." <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05> In your correspondent's example, the hyphen is a starting range point but is neither first in the bracket expression nor is specified as a collating symbol, so the regexp doesn't conform to POSIX.
Even though it's not a bug I suppose it wouldn't hurt to make the GNU matchers compatible with 7th Edition Unix here, if someone really wants to take that task on; it's not urgent, though.
[Prev in Thread] | Current Thread | [Next in Thread] |