bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Syntax check prohibit_doubled_word: false positives on non-English t


From: Jim Meyering
Subject: Re: Syntax check prohibit_doubled_word: false positives on non-English text
Date: Sun, 05 Jun 2011 09:20:10 +0200

James Youngman wrote:
> I had an interesting failure from the prohibit_doubled_word syntax check:
>
> prohibit_doubled_word_RE_ ?= \
>   /\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt]o)\s+\1\b/gims
> prohibit_doubled_word_ =                                              \
>     -e 'while ($(prohibit_doubled_word_RE_))'                         \
>     $(perl_filename_lineno_text_)
>
> sc_prohibit_doubled_word:
>       @perl -n -0777 $(prohibit_doubled_word_) $$($(VC_LIST_EXCEPT))  \
>         | grep -vE '$(ignore_doubled_word_match_RE_)'                 \
>         | grep . && { echo '$(ME): doubled words' 1>&2; exit 1; } || :
>
>
> This gave a false positive on findutils/po/ga.po (the Irish message file):
> #: find/parser.c:2844
> #, c-format
> msgid "Arguments to -type should contain only one letter"
> mamsgstr "Ní cheadaítear ach litir amháin in argóint i ndiaidh -type"
>
>
> It looks to me like Perl's \b has matched after "á" here.
>
> $ sed -ne '435 p'
> /home/james/source/GNU/findutils/git/gnu/findutils/po/ga.po| od -c
> 0000000   m   s   g   s   t   r       "   N 303 255       c   h   e   a
> 0000020   d   a 303 255   t   e   a   r       a   c   h       l   i   t
> 0000040   i   r       a   m   h 303 241   i   n       i   n       a   r
...
>
> I don't know enough about the Perl regex implementation to know how
> smart \b is supposed to be here.   But in any case, the syntax check
> is clearly intended to check for common English problems, and this
> file is certainly not in English (well, it contains English text, but
> that text is copied from files elsewhere in the same package, and so
> the English text will be checked at its point of origin).

Hi James,

Interesting one, indeed.  \241 is a non-word-constituent
since that use of Perl is locale/lmulti-byte ignorant.

There were several false positives.
For some, I suggest adding double quotes or reordering
the preceding lines as in find.texi.
For others, it should be ok simply to exempt the entire file,
via this addition to cfg.mk:

  exclude_file_name_regexp--sc_prohibit_doubled_word = \
    /(iquotes\.xo|ga\.po|tree\.c)$$

Here are some proposed work-arounds:

diff --git a/README b/README
index ec5d312..87b9de7 100644
--- a/README
+++ b/README
@@ -51,7 +51,7 @@ that it knows aren't directories until it encounters a test 
or action
 that needs the stat info.
 2.  Rearranging the command line, where possible, so that it can do tests
 that don't require a stat before tests that do, in hopes that the
-latter will be skipped because of an OR or AND.  (But it only does
+latter will be skipped because of an "OR" or "AND".  (But it only does
 this where it will leave the output unchanged.)

 The locate program and its helper programs are derived (heavily
diff --git a/cfg.mk b/cfg.mk
index 341a550..ff093ee 100644
--- a/cfg.mk
+++ b/cfg.mk
@@ -53,3 +53,6 @@ local-checks-to-skip += sc_prohibit_strcmp 
sc_prohibit_stat_st_blocks
 # other than the most recent section.   If you do need to retrospectively 
update
 # a historic section, run "make update-NEWS-hash", which will then edit this 
file.
 old_NEWS_hash := d41d8cd98f00b204e9800998ecf8427e
+
+exclude_file_name_regexp--sc_prohibit_doubled_word = \
+  /(iquotes\.xo|ga\.po|tree\.c)$$
diff --git a/doc/find.texi b/doc/find.texi
index 5d9b096..cf77233 100644
--- a/doc/find.texi
+++ b/doc/find.texi
@@ -1486,14 +1486,14 @@ protect the @samp{!} from shell interpretation by 
quoting it.
 @item @address@hidden expr2}}
 @itemx @address@hidden -a @var{expr2}}
 @itemx @address@hidden -and @var{expr2}}
address@hidden -a
 @findex -and
address@hidden -a
 And; @var{expr2} is not evaluated if @var{expr1} is false.

 @item @address@hidden -o @var{expr2}}
 @itemx @address@hidden -or @var{expr2}}
address@hidden -o
 @findex -or
address@hidden -o
 Or; @var{expr2} is not evaluated if @var{expr1} is true.

 @item @address@hidden , @var{expr2}}
diff --git a/find/tree.c b/find/tree.c
index 88b6be9..04b1e85 100644
--- a/find/tree.c
+++ b/find/tree.c
@@ -1220,7 +1220,7 @@ calculate_derived_rates (struct predicate *p)
 }

 /* opt_expr() rearranges predicates such that each left subtree is
- * rooted at a logical predicate (e.g. and or or).  check_normalization()
+ * rooted at a logical predicate (e.g. "and" or "or").  check_normalization()
  * asserts that this property still holds.
  *
  */



reply via email to

[Prev in Thread] Current Thread [Next in Thread]