bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grep-devel] Changed behavior in sed 4.6


From: arnold
Subject: Re: [Grep-devel] Changed behavior in sed 4.6
Date: Thu, 20 Dec 2018 22:13:26 -0700
User-agent: Heirloom mailx 12.5 7/5/10

Jim Meyering <address@hidden> wrote:

> On Thu, Dec 20, 2018 at 2:49 PM Jan Palus <address@hidden> wrote:
> > I've just happened to notice a difference in behavior between sed 4.5 and 
> > 4.6
> > when building VirtualBox. It seems to be locale dependent:
> >
> > $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> > foo(bar
> >
> > $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> > foo(foo *
> >
> > In 4.5 both results are the same -- same as the second output with
> > LC_ALL=C.UTF-8.
>
> Thanks a lot for that report.
> This is indeed a regression. It also affects the just-release
> grep-3.2, since the source is in a file used by both: gnulib's dfa.c.
> I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28
> To back that out, I must first revert part of this fix-up patch:
> v0.1-2281-g95cd86dd7
>
> Here's a demonstrator with grep: (it should match, but with 3.2, does not):
>
> $ echo 123-x|LC_ALL=C grep '.\bx'
> $
>
> To avoid the failure, one can:
> - specify -P (for PCRE, a different matcher), or
> - don't use the C locale, but rather use a multi-byte locale like the
> one you chose, which inhibits use of the DFA matcher, because \b's
> definition requires multi-byte aware machinery not present in the DFA
> matcher.
>
> I expect to revert the mentioned mentioned gnulib commits, and then to
> make new releases of both grep and sed.

Please add a test case ...

THanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]