bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18398: Probably found a bug in grep


From: Johannes Meixner
Subject: bug#18398: Probably found a bug in grep
Date: Thu, 4 Sep 2014 10:29:00 +0200 (CEST)
User-agent: Alpine 2.00 (LNX 1167 2008-08-23)


Hello,

On Sep 3 19:11 Bergen, Andreas wrote (excerpt):
I've probably found a bug in "grep".
...
testfile:  UTF-8 Unicode text
testfile2: ASCII text
...
Name        : grep
Version     : 2.5.1a
Vendor: SUSE LINUX Products GmbH, Nuernberg, Germany
Build Date: Tue Apr 22 03:47:13 2008
Install Date: Mon Jul  6 16:21:37 2009
Source RPM: grep-2.5.1a-20.17.src.rpm

This grep version is very old.
I found grep version 2.5.1a only in SUSE Linux Enterprise Server 10.
openSUSE distributions with such an old grep are no longer available.

I do not know if that old grep version was really meant to support
UTF-8 character encoding (multibyte characters) actually well
because I find almost nothing about "UTF" (ignore case) in the grep-2.5.1a sources. There is some multibyte character support
in grep-2.5.1a but I wonder to what extent it actually works.

In contrast in the grep-2.7 sources that we provide since
SUSE Linux Enterprise Server 11 Service Pack 2 (SLES11-SP2)
there is a lot more about "UTF" (ignore case). In the RPM changelog
of our grep RPM package for SLES11-SP2 there is in particular:
------------------------------------------------------------------
  Version upgrade to grep-2.7
  and reset to full compliance with upstream
...
  version upgrade to grep-2.6.3, which brings among various
  compile fixes vast improvements for UTF-8 / multibyte handling.
------------------------------------------------------------------

In general:

Any issues with various "traditional" Unix/Linux tools
that depend on the locale are very often no real bugs.

For users it is crucial to understand that any kind of
behaviour can depend on the locale (from keyboard input
via program behaviour to what is shown on the screen).

For basic information see
http://en.opensuse.org/SDB:Plain_Text_versus_Locale

When programs process "plain text files", the user who runs
the program must set up the locale environment to match the
encoding of the "plain text file" before he runs the program.

When you like to process your "plain text files" as you did
"since ever" with various "traditional" Unix/Linux tools,
you must use the POSIX locale, otherwise you will get weird
results and unexpected side-effects.

See also
http://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html


Kind Regards
Johannes Meixner
--
SUSE LINUX Products GmbH -- Maxfeldstrasse 5 -- 90409 Nuernberg -- Germany
HRB 16746 (AG Nuernberg) GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer





reply via email to

[Prev in Thread] Current Thread [Next in Thread]