bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/4] maint.mk: expand the prohibit_doubled_word regex


From: Ján Tomko
Subject: Re: [PATCH 2/4] maint.mk: expand the prohibit_doubled_word regex
Date: Mon, 1 Aug 2016 13:52:19 +0200
User-agent: Mutt/1.5.24 (2015-08-30)

On Fri, Jul 29, 2016 at 03:29:09PM -0600, Eric Blake wrote:
On 07/26/2016 08:28 AM, Ján Tomko wrote:
This check has a static list of words that are checked for repetitions.
Expand it before running the perl script to avoid using expensive
captures.
---
 ChangeLog    | 9 +++++++++
 top/maint.mk | 7 ++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)



+prohibit_doubled_words_ = \
+    the then in an on if is it but for or at and do to
+# expand the regex before running the check to avoid using expensive captures
+prohibit_doubled_word_expanded_ = \
+    $(shell echo $(prohibit_doubled_words_) | sed -r 's/\b(\S+)\b/\1\\s\+\1/g')

I bet GNU make has builtins that could do this operation without forking
to $(shell).  This stage results in a variable containing:

the\s\+the then\s\+then ...

Maybe:

$(join $(prohibit_doubled_words_),$(addprefix
\s\+,$(prohibit_doubled_words_)))

 prohibit_doubled_word_RE_ ?= \
-  /\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt]o)\s+\1\b/gims
+    /\b(?:$(subst $(space),|,$(prohibit_doubled_word_expanded_)))\b/gims

At any rate, you want to end up with the perl regex:

\b(?:the\s\+the|then\s\+then|...)\b/gims

 prohibit_doubled_word_ =                                               \
     -e 'while ($(prohibit_doubled_word_RE_))'                          \
     $(perl_filename_lineno_text_)


At any rate, I doubt my make fine-tuning matters, and you are definitely
correct that avoiding back-references makes perl regexes more efficient.

I doubt there is any difference in performance, but using join and
addprefix could be more readable than sed.

Jan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]