bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22059: grep -E: unexpected behaviour


From: Charles
Subject: bug#22059: grep -E: unexpected behaviour
Date: Mon, 30 Nov 2015 10:27:55 +0530

As expected:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1
Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? 
±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? 
±?¾MUæíE³èBãÄL'
Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? 
±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? 
±?¾MUæíE³èBãÄL'

But add the i to the pattern and the behaviour is unexpected:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* i' /var/log/syslog.1
[no output]

Apparently grep silently stops processing when it encounters the invalid UTF-8:

# grep -E --only-matching 'udisksd\[[[:digit:]]+\]: The string .* ' 
/var/log/syslog.1 | tail -1
udisksd[2650]: The string `TSSTcorp CDDVDW

In case the specific unusual characters are relevant, here they are in hex:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | head -1 
| cut --delimiter=' ' --fields=10-11 | od -x
0000000 4853 8251 f265 88d0 b120 b8d3 4dbe e655
0000020 45ed e8b3 e342 4cc4 0a27
0000032

When the input has invalid characters so grep cannot process it, a message 
could be expected perhaps configurable by the -s/--no-messages option because 
the input is (sort of) unreadable.

Version: 2.20 from the Debian Jessie package 2.20-4.1

Charles






reply via email to

[Prev in Thread] Current Thread [Next in Thread]