[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[^\]] in basic regexes
From: |
Wacek Kusnierczyk |
Subject: |
[^\]] in basic regexes |
Date: |
Fri, 13 Feb 2009 15:54:01 +0100 |
User-agent: |
Thunderbird 2.0.0.19 (X11/20090105) |
hello,
i observe a behaviour of grep that i am not sure is correct, possibly
due to my misunderstanding.
i've recently reviewed code written is some language were the intent was
to match a sequence of any number of non-']' characters. the matching
was done with an underlying regex library, and i have tried the pattern
directly with grep.
with grep, the pattern '[^]]' matches one non-] character:
grep '[^]]' <<< '[\]'
# match
however, in that code the pattern was '[^\]]*' (with the idea that the
character ']' is a metacharacter and therefore must be escaped).
according to the docs i know, it is not necessary to escape ']' within a
character class when it's the first character there (as in '[]]'), since
it then is not considered meta; but it shouldn't be harmful. it
happens that this pattern won't do:
grep '[^\]]' <<< '[\]'
# no match
this seems strange; i'd read the pattern as 'one character that is not
]'. clearly, the data has two such characters. alternatively, the
pattern could be read as 'one character that is neither \ nor ]', but
this would require the backslash to be treated as a regular character
(not a meta):
grep '[\]' <<< '[\]'
# match
grep '[^\]' <<< '[\]'
# match
grep '[^\[]' <<< '[\]'
# match
in fact, the third above has one possible match, so the pattern is read
as 'one non-\ non-[' rather than as 'one non-[':
grep -o '[^\[]' <<< '[\]'
# ]
so the 'one non-\ non-]' reading of '[^\]]' is not implausible; then,
there would one match, but there is none.
it actually appears that the pattern is read as 'one non-\ followed by
one ]':
grep -o '[^\]]' <<< '[]'
# []
that is, the first ] is not escaped (coherently with the case of
'[^\[]') but rather closes the character class, and the second
(unescaped!) ] does not close any class, but is taken literally!
(should this not be an invalid regex, with an unmatched class-closing
bracket?)
i haven't looked at the sources of grep, so these are plain guesses, but
is the behaviour of grep with '[^\]]' correct and intended, or is it a bug?
grep -V
# GNU grep 2.5.3
regards,
wacek
- [^\]] in basic regexes,
Wacek Kusnierczyk <=