bug#24973: [regression] [d-f] no longer includes e with acute accent in

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24973: [regression] [d-f] no longer includes e with acute accent in

From:	Stephane CHAZELAS
Subject:	bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales
Date:	Sun, 20 Nov 2016 22:06:29 +0000
User-agent:	Mutt/1.5.21 (2010-09-15)

2016-11-20 16:38:29 -0500, Dennis Clarke:
[...]
> On a Solaris 10 system the locales are named a bit different :
> dasoyva_$ locale -a
[...]
> en_US.ISO8859-1
> en_US.ISO8859-15
[...]
> dasoyva_$ LC_ALL=en_US.ISO8859-1 /usr/bin/printf '\351\n' | od -Ax -t x1 -v
> 0000000 e9 0a
> 0000002
> 
> I am not sure if the single byte 0xe9h is correct at all for this test.
[...]

Note that

printf '\351'

Will print the byte 0xe9 regardless of the locale.

0xe9 happens to be the code point for é in ISO8859-1 and
ISO8859-15.


> dasoyva_$ LC_ALL=en_US.UTF-8 /usr/bin/printf '\351\n' | od -Ax -t x1 -v
> 0000000 e9 0a
> 0000002
> 
> dasoyva_$ LC_ALL=en_US.ISO8859-1 /usr/bin/printf '\351\n'
> �
> 
> Wonder how I would test this on a strict POSIX system here. Any thoughts?
[...]

POSIX leaves all that unspecified. It doesn't specify any locale
other than C/POSIX. It leaves '[d-f]' unspecified in locales
other than C/POSIX.

Here, the problem is a change of behaviour between GNU grep 2.25
and 2.26. (and 2.26 behaviour makes it inconsistent with other
GNU utilities). Both behaviours are POSIX compliant, since [d-f] is
unspecified anyway.

On your Solaris machine, you can check:

printf '\351\n' | LC_ALL=en_US.ISO8859-1 gnu-grep '[d-f]' | od -An -vtx1

And check if it's consistent with /usr/xpg4/bin/grep.

-- 
Stephane

[Prev in Thread]

Current Thread

[Next in Thread]

bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales, Stephane Chazelas, 2016/11/20
- bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales, Stephane Chazelas, 2016/11/20
- bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales, Dennis Clarke, 2016/11/20
  - bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales, Stephane CHAZELAS <=
  - bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales, Stephane CHAZELAS, 2016/11/20
- bug#24973: [PATCH] dfa: fix logic typo, Paul Eggert, 2016/11/20
- bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales, Paul Eggert, 2016/11/20

Prev by Date: bug#24975: Matching issues with characters whose encoding ends in some other character
Next by Date: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales
Previous by thread: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales
Next by thread: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales
Index(es):
- Date
- Thread