[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gawk regex stuff you may want
From: |
Aharon Robbins |
Subject: |
Re: gawk regex stuff you may want |
Date: |
Sun, 24 Jan 2016 06:01:50 +0200 |
User-agent: |
Heirloom mailx 12.5 6/20/10 |
Hi Paul.
> As far as 'grep' is concerned, it'll trust what regcomp does here, so we
> do have some freedom to change the code in this area. However, it looks
> to me like your patch would do the wrong thing for unibyte locales where
> btowc (b) returns a value that neither b nor WEOF. Also, the rest the
> code assumes that if btowc returns WEOF in a multibyte locale then there
> won't be a match (see the setup code in init_dfa, and I have the nagging
> feeling that this assumption is embedded elsewhere). So, how about the
> attached more-conservative patch instead?
I applied that patch and gawk passes its tests. I will probably
keep it. See one comment, below.
> Again, it'd be helpful to know what the problem actually was.
I don't have detailed enough records to be able to tell when all these
small changes were added and why. I will keep them, since the hassle of
removing them, finding out which systems want them, and putting them
back is more than I care to deal with.
I may, one day, just drop in GNULIB's versions. But not yet.
> diff --git a/ChangeLog b/ChangeLog
> index 181f709..a870e86 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,11 @@
> +2016-01-21 Paul Eggert <address@hidden>
> +
> + regex: treat [x] as x if x is a unibyte encoding error
> + Problem reported by Aharon Robbins in:
> + http://lists.gnu.org/archive/html/bug-gnulib/2016-01/msg00091.html
> + * lib/regcomp.c (parse_byte) [_LIBC && RE_ENABLE_I18N]: New function.
> + (build_range_exp) [_LIBC && RE_ENABLE_I18N]: Use it.
I think you mean ! _LIBC && RE_ENABLE_I18N.
Thanks,
Arnold