bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/4] maint.mk: expand the prohibit_doubled_word regex


From: Eric Blake
Subject: Re: [PATCH 2/4] maint.mk: expand the prohibit_doubled_word regex
Date: Fri, 29 Jul 2016 15:29:09 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 07/26/2016 08:28 AM, Ján Tomko wrote:
> This check has a static list of words that are checked for repetitions.
> Expand it before running the perl script to avoid using expensive
> captures.
> ---
>  ChangeLog    | 9 +++++++++
>  top/maint.mk | 7 ++++++-
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/ChangeLog b/ChangeLog
> index 7dd78e3..b698a6c 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,5 +1,14 @@
>  2016-07-26  Ján Tomko  <address@hidden>
>  
> +     maint.mk: expand the prohibit_doubled_word regex
> +
> +     This check has a static list of words that are checked for
> +     repetitions.
> +     Expand it before running the perl script to avoid using expensive
> +     captures.

gnulib is still stuck in the old ways of GNU-style changelog entries
where you call out the file and section touched, as in:

        * maint.mk (prohibit_doubled_word): Pre-expand the regex to
        avoid expensive perl regex backreferences.

Can be touched up on commit.

>  
> +prohibit_doubled_words_ = \
> +    the then in an on if is it but for or at and do to
> +# expand the regex before running the check to avoid using expensive captures
> +prohibit_doubled_word_expanded_ = \
> +    $(shell echo $(prohibit_doubled_words_) | sed -r 
> 's/\b(\S+)\b/\1\\s\+\1/g')

I bet GNU make has builtins that could do this operation without forking
to $(shell).  This stage results in a variable containing:

the\s\+the then\s\+then ...

Maybe:

$(join $(prohibit_doubled_words_),$(addprefix
\s\+,$(prohibit_doubled_words_)))

>  prohibit_doubled_word_RE_ ?= \
> -  /\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt]o)\s+\1\b/gims
> +    /\b(?:$(subst $(space),|,$(prohibit_doubled_word_expanded_)))\b/gims

At any rate, you want to end up with the perl regex:

\b(?:the\s\+the|then\s\+then|...)\b/gims

>  prohibit_doubled_word_ =                                             \
>      -e 'while ($(prohibit_doubled_word_RE_))'                                
> \
>      $(perl_filename_lineno_text_)
> 

At any rate, I doubt my make fine-tuning matters, and you are definitely
correct that avoiding back-references makes perl regexes more efficient.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]