bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

have you ever mistyped [[:lower:]] as [:lower:] ?


From: Jim Meyering
Subject: have you ever mistyped [[:lower:]] as [:lower:] ?
Date: Wed, 01 Sep 2010 10:11:34 +0200

Paolo Bonzini wrote:
>> In line with my personal preferences [if I make the mistake, I want to
>> find out as soon as possible], I think we should make grep work as
>> Paul suggests.  The value/risk ratio of such a change is unusually high,
>> since the likelihood of a valid use triggering an unwarranted failure
>> is so low, and the typo of omitting the outer [...] in e.g., "[[:lower:]]"
>> is so common.

Hi Paolo,

Have you ever mistyped [[:lower:]] as [:lower:] ?
I know that I have.

...
> Because even though we agree that the regex we warn about is
> (practically) always going to be semantically incorrect, there's no
> doubt that the regex are syntactically valid, and this is IMHO enough
> reason to not return an exit status of 2.

Anyone who deliberately uses a regexp like [:lower:] to mean [:lower]
(to what end?  deceit?  obfuscation?) deserves more than a little
inconvenience.

> Rewriting it as [[:lower:]] would be as good then (or as bad).

No, that's not the same at all.  grep must not do that.
It is one thing to fail for an obviously erroneous construct.
It would be worse to silently transform it into the "intended"
one, since then GNU grep would silently work as intended, but that
same erroneous command would produce different results with non-GNU grep.

Here, we're making it clear that this is a serious error.

> Second, if this was done, it should operate in the same way in sed,
> expr, awk, and all other GNU programs that deal with regexes.  (And
> possibly in glibc too).

It would be nice to make other GNU programs provide the same new
feature.  However, if they don't (or don't right away), it's not
a big deal.

> If you want to add --warn=error (which is a "superset" of
> --warn=always behavior), that's fine and I actually like the idea.
> But I think making it the default non-POSIXLY_CORRECT behavior is
> wrong.  Honestly, if this happened I would regret having introduced
> the feature in the first place.

I hope you don't regret it.
Sometimes you just have to admit that
POSIX-is-clear-and-POSIX-can-be-improved.
We should not let standards get in the way of improving our software.

That we are making this the default behavior
is a tribute to the usefulness of your new feature.
Thank you.

With this feature, users of GNU grep will avoid an entire class of
character class typos.

The following change-set implements what Paul and I have been advocating.
I'll push it later today or tomorrow.

>From f57dc66cf4fe6b8a1404028c0ac957a42201de4a Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 31 Aug 2010 21:14:41 +0200
Subject: [PATCH] grep: diagnose and exit-2 for bogus REs like [:space:], 
[:digit:], etc.

When I make a mistake like this:
  grep '[:lower:]' ...
be it in a script or on the command line, I want to know about
it as soon as possible.  I don't want grep to print a mere warning
that it is interpreting this suspicious and almost guaranteed-wrong
regular expression as a set of just 6 bytes.  And I certainly don't
want grep to silently do the wrong thing, even if that would be
officially standards-conforming.  It's obvious that I intended
[[:lower:]], and I want my error to be diagnosed in a way that is
most likely to get my attention.  Thus, with this change, grep now
prints a diagnostic and exits with status 2 the moment it
encounters an offending [:char_class:] construct.

This changes the way grep works by default, rather than
putting this new behavior on an option.  A new option
would seldom be used in scripts (not portable), and would
probably be used only rarely by those who need it the most.
This new functionality provides a valuable safety measure
and incurs truly negligible risk.

For strict POSIX compliance, set POSIXLY_CORRECT in
your environment.  That disables this new feature.

Revert the changes from commit 2cd3bcea, "grep: add
--warnings={always,never,auto}.", and then do the following:

* src/dfasearch.c (dfawarn): Call getenv("POSIXLY_CORRECT") here;
Remove "warning: " from the diagnostic, now that it's more than
a warning, and exit with status 2.
* NEWS (New features): Describe the new semantics.
* tests/warn-char-classes: Adjust one test to accommodate this change.
---
 NEWS                    |   13 +++++----
 src/dfasearch.c         |   20 +++++++++-----
 src/grep.h              |    2 -
 src/main.c              |   67 +++++++----------------------------------------
 tests/warn-char-classes |    6 +++-
 5 files changed, 34 insertions(+), 74 deletions(-)

diff --git a/NEWS b/NEWS
index 553ec52..18289ba 100644
--- a/NEWS
+++ b/NEWS
@@ -29,12 +29,13 @@ GNU grep NEWS                                    -*- 
outline -*-

 ** New features

-  grep now will warn for very common regular expression mistakes,
-  such as using [:space:] instead of [[:space:]].  Warnings are
-  disabled by POSIXLY_CORRECT.  They are also disabled when stdout
-  is not a TTY, thus minimizing the chance of extraneous output
-  in a script.  However, the rules for enabling/disabling warnings
-  are experimental and subject to change in future releases.
+  grep now diagnoses (and fails with exit status 2) commonly mistyped
+  regular expression like [:space:], [:digit:], etc.  Before, those were
+  silently interpreted as [ac:eps] and [dgit:] respectively.  Virtually
+  all who make that class of mistake should have used [[:space:]] or
+  [[:digit:]].  This new behavior is disabled when the POSIXLY_CORRECT
+  environment variable is set.
+

 * Noteworthy changes in release 2.6.3 (2010-04-02) [stable]

diff --git a/src/dfasearch.c b/src/dfasearch.c
index 29a1f7e..44eb08a 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -46,13 +46,6 @@ static struct patterns *patterns;
 static size_t pcount;

 void
-dfawarn (char const *mesg)
-{
-  if (warnings)
-    error (0, 0, _("warning: %s"), mesg);
-}
-
-void
 dfaerror (char const *mesg)
 {
   error (EXIT_TROUBLE, 0, "%s", mesg);
@@ -62,6 +55,19 @@ dfaerror (char const *mesg)
   abort ();
 }

+/* For now, the sole dfawarn-eliciting condition (use of a regexp
+   like '[:lower:]') is unequivocally an error, so treat it as such,
+   when possible.  */
+void
+dfawarn (char const *mesg)
+{
+  static enum { NONE = 0, POSIX, GNU } mode;
+  if (mode == NONE)
+    mode = (getenv ("POSIXLY_CORRECT") ? POSIX : GNU);
+  if (mode == GNU)
+    dfaerror (mesg);
+}
+
 /* Number of compiled fixed strings known to exactly match the regexp.
    If kwsexec returns < kwset_exact_matches, then we don't need to
    call the regexp matcher at all. */
diff --git a/src/grep.h b/src/grep.h
index 64c1865..67ea793 100644
--- a/src/grep.h
+++ b/src/grep.h
@@ -43,5 +43,3 @@ extern int match_icase;               /* -i */
 extern int match_words;                /* -w */
 extern int match_lines;                /* -x */
 extern unsigned char eolbyte;  /* -z */
-
-extern int warnings;
diff --git a/src/main.c b/src/main.c
index 6f5c4ae..0a91b7f 100644
--- a/src/main.c
+++ b/src/main.c
@@ -286,8 +286,7 @@ enum
   LABEL_OPTION,
   EXCLUDE_DIRECTORY_OPTION,
   GROUP_SEPARATOR_OPTION,
-  MMAP_OPTION,
-  WARN_OPTION,
+  MMAP_OPTION
 };

 /* Long options equivalences. */
@@ -343,7 +342,6 @@ static struct option const long_options[] =
   {"binary", no_argument, NULL, 'U'},
   {"unix-byte-offsets", no_argument, NULL, 'u'},
   {"version", no_argument, NULL, 'V'},
-  {"warn", optional_argument, NULL, WARN_OPTION},
   {"with-filename", no_argument, NULL, 'H'},
   {"word-regexp", no_argument, NULL, 'w'},
   {0, 0, 0, 0}
@@ -354,7 +352,6 @@ int match_icase;
 int match_words;
 int match_lines;
 unsigned char eolbyte;
-int warnings;

 /* For error messages. */
 /* The name the program was run with, stripped of any leading path. */
@@ -388,27 +385,6 @@ static enum
     SKIP_DEVICES
   } devices = READ_DEVICES;

-enum warn_type
-  {
-    WARN_ALWAYS = 0,
-    WARN_NEVER,
-    WARN_AUTO,
-  };
-
-static char const *const warn_args[] =
-{
-  "always", "yes", "force",
-  "never", "no", "none",
-  "auto", "tty", "if-tty", NULL
-};
-static enum warn_type const warn_types[] =
-{
-  WARN_ALWAYS, WARN_ALWAYS, WARN_ALWAYS,
-  WARN_NEVER, WARN_NEVER, WARN_NEVER,
-  WARN_AUTO, WARN_AUTO, WARN_AUTO
-};
-ARGMATCH_VERIFY (warn_args, warn_types);
-
 static int grepdir (char const *, struct stats const *);
 #if defined HAVE_DOS_FILE_CONTENTS
 static inline int undossify_input (char *, size_t);
@@ -1783,7 +1759,6 @@ main (int argc, char **argv)
   int opt, cc, status;
   int default_context;
   FILE *fp;
-  enum warn_type w;

   initialize_main (&argc, &argv);
   set_program_name (argv[0]);
@@ -1816,8 +1791,6 @@ main (int argc, char **argv)
   exit_failure = EXIT_TROUBLE;
   atexit (close_stdout);

-  w = getenv ("POSIXLY_CORRECT") ? WARN_NEVER : WARN_ALWAYS;
-
   prepend_default_options (getenv ("GREP_OPTIONS"), &argc, &argv);
   setmatcher (NULL);

@@ -2051,6 +2024,15 @@ main (int argc, char **argv)
             show_help = 1;
         } else
           color_option = 2;
+        if (color_option == 2)
+          {
+            char const *t;
+            if (isatty (STDOUT_FILENO) && (t = getenv ("TERM"))
+                && !STREQ (t, "dumb"))
+              color_option = 1;
+            else
+              color_option = 0;
+          }
         break;

       case EXCLUDE_OPTION:
@@ -2098,13 +2080,6 @@ main (int argc, char **argv)
         /* long options */
         break;

-      case WARN_OPTION:
-        if (optarg)
-          w = XARGMATCH ("--warn", optarg, warn_args, warn_types);
-        else
-          w = WARN_ALWAYS;
-        break;
-
       default:
         usage (EXIT_TROUBLE);
         break;
@@ -2127,28 +2102,6 @@ main (int argc, char **argv)
   if (out_before < 0)
     out_before = default_context;

-  char const *t;
-  if (isatty (STDOUT_FILENO)
-      && (t = getenv ("TERM")) && !STREQ (t, "dumb"))
-    {
-      if (color_option == 2)
-        color_option = 1;
-      if (w == WARN_AUTO)
-        w = WARN_ALWAYS;
-    }
-  else
-    {
-      if (color_option == 2)
-        color_option = 0;
-
-      /* Redirected stdout: we're likely in a script, disable warnings.  */
-      if (w == WARN_AUTO)
-        w = WARN_NEVER;
-    }
-
-  /* assert (w == WARN_ALWAYS || w == WARN_NEVER); */
-  warnings = w == WARN_ALWAYS;
-
   if (color_option)
     {
       /* Legacy.  */
diff --git a/tests/warn-char-classes b/tests/warn-char-classes
index 069489f..25bf640 100644
--- a/tests/warn-char-classes
+++ b/tests/warn-char-classes
@@ -7,12 +7,14 @@ set -x
 echo f > x || framework_failure_
 echo b >> x || framework_failure_
 echo h >> x || framework_failure_
+printf 'grep: character class syntax is [[:space:]], not [:space:]\n' \
+  > exp-err || framework_failure_

 # basic cases

 grep '[:space:]' x 2> err
-test $? = 1 || fail=1
-test -s err || fail=1
+test $? = 2 || fail=1
+compare err exp-err || fail=1

 grep '[[:space:]]' x 2> err
 test $? = 1 || fail=1
--
1.7.2.2.510.g7180a



reply via email to

[Prev in Thread] Current Thread [Next in Thread]