[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #37600] grep -w cuts words on non-ascii
From: |
Flammie Pirinen |
Subject: |
[bug #37600] grep -w cuts words on non-ascii |
Date: |
Fri, 19 Oct 2012 02:10:23 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.57 Safari/537.1 |
URL:
<http://savannah.gnu.org/bugs/?37600>
Summary: grep -w cuts words on non-ascii
Project: grep
Submitted by: flammie
Submitted on: Fri 19 Oct 2012 02:10:23 AM GMT
Category: None
Severity: 3 - Normal
Item Group: None
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
_______________________________________________________
Details:
It seems that grep -w does not support non-ascii characters, at least for
locale fi-FI.utf8:
$ cat > test
xxx
xxxä
xxxx
$ grep -w xxx test
xxx
xxxä
System is Gentoo Linux, stable, x86 with GNU glibc-2.14.1-r3 and following
setup:
$ grep -V
grep (GNU grep) 2.12
$ locale
LANG=fi_FI.UTF-8
LC_CTYPE="fi_FI.UTF-8"
LC_NUMERIC="fi_FI.UTF-8"
LC_TIME="fi_FI.UTF-8"
LC_COLLATE="fi_FI.UTF-8"
LC_MONETARY="fi_FI.UTF-8"
LC_MESSAGES="fi_FI.UTF-8"
LC_PAPER="fi_FI.UTF-8"
LC_NAME="fi_FI.UTF-8"
LC_ADDRESS="fi_FI.UTF-8"
LC_TELEPHONE="fi_FI.UTF-8"
LC_MEASUREMENT="fi_FI.UTF-8"
LC_IDENTIFICATION="fi_FI.UTF-8"
LC_ALL=fi_FI.UTF-8
If this behaviour is intentional, the description of -w switch in
documentation should be clarified. Since grep can well match ä to [:alpha:]
class on my locale I would expect from following that ä is a "word
constituent character":
-w, --word-regexp
Select only those lines containing matches that form
whole
words. The test is that the matching substring must either
be
at the beginning of the line, or preceded by a
non-word
constituent character. Similarly, it must be either at the
end
of the line or followed by a non-word constituent
character.
Word-constituent characters are letters, digits, and
the
underscore.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?37600>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug #37600] grep -w cuts words on non-ascii,
Flammie Pirinen <=