[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Grep-devel] Changed behavior in sed 4.6
From: |
arnold |
Subject: |
Re: [Grep-devel] Changed behavior in sed 4.6 |
Date: |
Thu, 20 Dec 2018 22:13:26 -0700 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Jim Meyering <address@hidden> wrote:
> On Thu, Dec 20, 2018 at 2:49 PM Jan Palus <address@hidden> wrote:
> > I've just happened to notice a difference in behavior between sed 4.5 and
> > 4.6
> > when building VirtualBox. It seems to be locale dependent:
> >
> > $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> > foo(bar
> >
> > $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g'
> > foo(foo *
> >
> > In 4.5 both results are the same -- same as the second output with
> > LC_ALL=C.UTF-8.
>
> Thanks a lot for that report.
> This is indeed a regression. It also affects the just-release
> grep-3.2, since the source is in a file used by both: gnulib's dfa.c.
> I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28
> To back that out, I must first revert part of this fix-up patch:
> v0.1-2281-g95cd86dd7
>
> Here's a demonstrator with grep: (it should match, but with 3.2, does not):
>
> $ echo 123-x|LC_ALL=C grep '.\bx'
> $
>
> To avoid the failure, one can:
> - specify -P (for PCRE, a different matcher), or
> - don't use the C locale, but rather use a multi-byte locale like the
> one you chose, which inhibits use of the DFA matcher, because \b's
> definition requires multi-byte aware machinery not present in the DFA
> matcher.
>
> I expect to revert the mentioned mentioned gnulib commits, and then to
> make new releases of both grep and sed.
Please add a test case ...
THanks,
Arnold