bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gettext] [PATCH v2] xgettext: Support message syntax checks


From: Daiki Ueno
Subject: [bug-gettext] [PATCH v2] xgettext: Support message syntax checks
Date: Wed, 04 Feb 2015 18:30:24 +0900
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

With this change, xgettext could report common syntactic problems
in strings to be extracted.  Current built-in checks are
ellipsis-unicode, space-ellipsis, and quote-unicode.  Those checks
can be enabled with --check option of xgettext and disabled with
special "xgettext:" comment in source files.
Feature suggested by Philip Withnall in:
https://savannah.gnu.org/bugs/?44098
* gettext-tools/src/message.h (enum syntax_check_type): New enum.
(NSYNTAXCHECKS): New constant.
(enum is_syntax_check): New enum.
(struct message_ty): New field 'do_syntax_check'.
(syntax_check_name): New variable declaration.
* gettext-tools/src/message.c (syntax_check_name): New variable.
* gettext-tools/src/msgl-cat.c (catenate_msgdomain_list): Propagate
mp->do_syntax_check.
* gettext-tools/src/msgmerge.c (message_merge): Propagate
ref->do_syntax_check.
* gettext-tools/src/msgl-check.h (syntax_check_message_list): New
declaration.
* gettext-tools/src/msgl-check.c (syntax_check_ellipsis_unicode): New
function.
(syntax_check_space_ellipsis): New function.
(syntax_check_quote_unicode): New function.
(syntax_check_message): New function.
(syntax_check_message_list): New function.
* gettext-tools/src/read-catalog-abstract.h (po_parse_comment_special):
Adjust function declaration.
* gettext-tools/src/read-catalog-abstract.c (po_parse_comment_special):
Add new argument SCP for syntax checking; all callers changed.
* gettext-tools/src/read-catalog.h (DEFAULT_CATALOG_READER_TY): New
field 'do_syntax_check'.
* gettext-tools/src/read-catalog.c (default_constructor): Initialize
this->do_syntax_check.
(default_copy_comment_state): Propagate this->do_syntax_check.
* gettext-tools/src/xgettext.c (long_options): Add --check option.
(main): Handle --check option.
(usage): Document --check option.
(remember_a_message): Propagate do_syntax_check value.

* gettext-tools/tests/xgettext-13: New file.
* gettext-tools/tests/Makefile.am (TESTS): Add new test.

* gettext-tools/doc/xgettext.texi: Document --check option.
---
 gettext-tools/doc/ChangeLog               |   4 +
 gettext-tools/doc/xgettext.texi           |  36 ++++++++
 gettext-tools/src/ChangeLog               |  39 ++++++++
 gettext-tools/src/message.c               |  12 +++
 gettext-tools/src/message.h               |  26 ++++++
 gettext-tools/src/msgl-cat.c              |  13 +++
 gettext-tools/src/msgl-check.c            | 144 ++++++++++++++++++++++++++++++
 gettext-tools/src/msgl-check.h            |   4 +-
 gettext-tools/src/msgmerge.c              |   3 +
 gettext-tools/src/read-catalog-abstract.c |  35 +++++++-
 gettext-tools/src/read-catalog-abstract.h |   3 +-
 gettext-tools/src/read-catalog.c          |   8 +-
 gettext-tools/src/read-catalog.h          |   1 +
 gettext-tools/src/xgettext.c              |  67 +++++++++++++-
 gettext-tools/tests/ChangeLog             |   5 ++
 gettext-tools/tests/Makefile.am           |   1 +
 gettext-tools/tests/xgettext-13           |  99 ++++++++++++++++++++
 17 files changed, 492 insertions(+), 8 deletions(-)
 create mode 100755 gettext-tools/tests/xgettext-13

diff --git a/gettext-tools/doc/ChangeLog b/gettext-tools/doc/ChangeLog
index edac431..645c580 100644
--- a/gettext-tools/doc/ChangeLog
+++ b/gettext-tools/doc/ChangeLog
@@ -1,3 +1,7 @@
+2015-02-04  Daiki Ueno  <address@hidden>
+
+       * xgettext.texi: Document --check option.
+
 2015-02-03  Daiki Ueno  <address@hidden>
 
        * msgexec.texi, msgfilter.texi: Fix markup error caused by commit
diff --git a/gettext-tools/doc/xgettext.texi b/gettext-tools/doc/xgettext.texi
index 451e25f..1fb4bc1 100644
--- a/gettext-tools/doc/xgettext.texi
+++ b/gettext-tools/doc/xgettext.texi
@@ -144,6 +144,42 @@ gettext (
 The second comment line will not be extracted, because there is one
 blank line between the comment line and the keyword.
 
address@hidden address@hidden
address@hidden address@hidden
address@hidden address@hidden, @code{xgettext} option}
address@hidden address@hidden, @code{xgettext} option}
address@hidden supported syntax checks, @code{xgettext}
+Perform a syntax check on msgid and msgid_plural.  The supported checks
+are:
+
address@hidden @samp
address@hidden ellipsis-unicode
+Prefer Unicode ellipsis character over ASCII @code{...}
+
address@hidden space-ellipsis
+Prohibit whitespace before an ellipsis character
+
address@hidden quote-unicode
+Prefer Unicode quotation marks over ASCII @code{"'`}
+
address@hidden table
+
+The option has an effect on the all input files.  To enable or disable
+checks, you can mark it with @code{xgettext:} comment in the source
+file.  For example, if you specify @code{-Wspace-ellipsis} option, but
+want to suppress the check on a particular string, add a special comment:
+
address@hidden
+/* xgettext: no-space-ellipsis-check */
+gettext ("We really really need to output ...");
address@hidden example
+
+The special @code{xgettext:} comment can be followed by flags separated
+with a comma.  The possible flags are of the form
address@hidden@var{name}-check}, where @var{name} is the name of one
+of the valid syntax checks.  If a flag is prefixed by @code{no-}, the
+meaning is negated.
+
 @end table
 
 @subsection Language specific options
diff --git a/gettext-tools/src/ChangeLog b/gettext-tools/src/ChangeLog
index 633ec9e..7a542b9 100644
--- a/gettext-tools/src/ChangeLog
+++ b/gettext-tools/src/ChangeLog
@@ -1,3 +1,42 @@
+2015-02-04  Daiki Ueno  <address@hidden>
+
+       xgettext: Support message syntax checks
+       With this change, xgettext could report common syntactic problems
+       in strings to be extracted.  Current built-in checks are
+       ellipsis-unicode, space-ellipsis, and quote-unicode.  Those checks
+       can be enabled with --check option of xgettext and disabled with
+       special "xgettext:" comment in source files.
+       Feature suggested by Philip Withnall in:
+       https://savannah.gnu.org/bugs/?44098
+       * message.h (enum syntax_check_type): New enum.
+       (NSYNTAXCHECKS): New constant.
+       (enum is_syntax_check): New enum.
+       (struct message_ty): New field 'do_syntax_check'.
+       (syntax_check_name): New variable declaration.
+       * message.c (syntax_check_name): New variable.
+       * msgl-cat.c (catenate_msgdomain_list): Propagate
+       mp->do_syntax_check.
+       * msgmerge.c (message_merge): Propagate ref->do_syntax_check.
+       * msgl-check.h (syntax_check_message_list): New declaration.
+       * msgl-check.c (syntax_check_ellipsis_unicode): New function.
+       (syntax_check_space_ellipsis): New function.
+       (syntax_check_quote_unicode): New function.
+       (syntax_check_message): New function.
+       (syntax_check_message_list): New function.
+       * read-catalog-abstract.h (po_parse_comment_special): Adjust
+       function declaration.
+       * read-catalog-abstract.c (po_parse_comment_special): Add new
+       argument SCP for syntax checking; all callers changed.
+       * read-catalog.h (DEFAULT_CATALOG_READER_TY): New field
+       'do_syntax_check'.
+       * read-catalog.c (default_constructor): Initialize
+       this->do_syntax_check.
+       (default_copy_comment_state): Propagate this->do_syntax_check.
+       * xgettext.c (long_options): Add --check option.
+       (main): Handle --check option.
+       (usage): Document --check option.
+       (remember_a_message): Propagate do_syntax_check value.
+
 2015-02-03  Daiki Ueno  <address@hidden>
 
        msgfilter: Factor out quoted string handling
diff --git a/gettext-tools/src/message.c b/gettext-tools/src/message.c
index 586675f..2596887 100644
--- a/gettext-tools/src/message.c
+++ b/gettext-tools/src/message.c
@@ -104,6 +104,14 @@ possible_format_p (enum is_format is_format)
 }
 
 
+const char *const syntax_check_name[NSYNTAXCHECKS] =
+{
+  /* sc_ellipsis_unicode */     "ellipsis-unicode",
+  /* sc_space_ellipsis */       "space-ellipsis",
+  /* sc_quote_unicode */        "quote-unicode"
+};
+
+
 message_ty *
 message_alloc (const char *msgctxt,
                const char *msgid, const char *msgid_plural,
@@ -130,6 +138,8 @@ message_alloc (const char *msgctxt,
   mp->range.min = -1;
   mp->range.max = -1;
   mp->do_wrap = undecided;
+  for (i = 0; i < NSYNTAXCHECKS; i++)
+    mp->do_syntax_check[i] = undecided;
   mp->prev_msgctxt = NULL;
   mp->prev_msgid = NULL;
   mp->prev_msgid_plural = NULL;
@@ -235,6 +245,8 @@ message_copy (message_ty *mp)
     result->is_format[i] = mp->is_format[i];
   result->range = mp->range;
   result->do_wrap = mp->do_wrap;
+  for (i = 0; i < NSYNTAXCHECKS; i++)
+    result->do_syntax_check[i] = mp->do_syntax_check[i];
   for (j = 0; j < mp->filepos_count; ++j)
     {
       lex_pos_ty *pp = &mp->filepos[j];
diff --git a/gettext-tools/src/message.h b/gettext-tools/src/message.h
index bf2215a..8b9bc3f 100644
--- a/gettext-tools/src/message.h
+++ b/gettext-tools/src/message.h
@@ -114,6 +114,29 @@ enum is_wrap
 #endif
 
 
+/* Kinds of syntax checks which apply to strings.  */
+enum syntax_check_type
+{
+  sc_ellipsis_unicode,
+  sc_space_ellipsis,
+  sc_quote_unicode
+};
+#define NSYNTAXCHECKS 3
+extern DLL_VARIABLE const char *const syntax_check_name[NSYNTAXCHECKS];
+
+/* Is current msgid subject to a syntax check?  */
+#if 0
+enum is_syntax_check
+{
+  undecided,
+  yes,
+  no
+};
+#else /* HACK - C's enum concept is so stupid */
+#define is_syntax_check is_format
+#endif
+
+
 struct altstr
 {
   const char *msgstr;
@@ -175,6 +198,9 @@ struct message_ty
   /* Do we want the string to be wrapped in the emitted PO file?  */
   enum is_wrap do_wrap;
 
+  /* Do we want to apply extra syntax checks on the string?  */
+  enum is_syntax_check do_syntax_check[NSYNTAXCHECKS];
+
   /* The prev_msgctxt, prev_msgid and prev_msgid_plural strings appearing
      before the message, if present.  Generated by msgmerge.  */
   const char *prev_msgctxt;
diff --git a/gettext-tools/src/msgl-cat.c b/gettext-tools/src/msgl-cat.c
index 0bd58d4..8502a64 100644
--- a/gettext-tools/src/msgl-cat.c
+++ b/gettext-tools/src/msgl-cat.c
@@ -308,6 +308,8 @@ domain \"%s\" in input file '%s' doesn't contain a header 
entry with a charset s
                   tmp->range.min = - INT_MAX;
                   tmp->range.max = - INT_MAX;
                   tmp->do_wrap = yes; /* may be set to no later */
+                  for (i = 0; i < NSYNTAXCHECKS; i++)
+                    tmp->do_syntax_check[i] = undecided; /* may be set to 
yes/no later */
                   tmp->obsolete = true; /* may be set to false later */
                   tmp->alternative_count = 0;
                   tmp->alternative = NULL;
@@ -535,6 +537,8 @@ UTF-8 encoded from the beginning, i.e. already in your 
source code files.\n"),
                     tmp->is_format[i] = mp->is_format[i];
                   tmp->range = mp->range;
                   tmp->do_wrap = mp->do_wrap;
+                  for (i = 0; i < NSYNTAXCHECKS; i++)
+                    tmp->do_syntax_check[i] = mp->do_syntax_check[i];
                   tmp->prev_msgctxt = mp->prev_msgctxt;
                   tmp->prev_msgid = mp->prev_msgid;
                   tmp->prev_msgid_plural = mp->prev_msgid_plural;
@@ -583,6 +587,9 @@ UTF-8 encoded from the beginning, i.e. already in your 
source code files.\n"),
                     }
                   if (tmp->do_wrap == undecided)
                     tmp->do_wrap = mp->do_wrap;
+                  for (i = 0; i < NSYNTAXCHECKS; i++)
+                    if (tmp->do_syntax_check[i] == undecided)
+                      tmp->do_syntax_check[i] = mp->do_syntax_check[i];
                   tmp->obsolete = false;
                 }
               else
@@ -635,6 +642,12 @@ UTF-8 encoded from the beginning, i.e. already in your 
source code files.\n"),
                     }
                   if (mp->do_wrap == no)
                     tmp->do_wrap = no;
+                  for (i = 0; i < NSYNTAXCHECKS; i++)
+                    if (mp->do_syntax_check[i] == yes)
+                      tmp->do_syntax_check[i] = yes;
+                    else if (mp->do_syntax_check[i] == no
+                             && tmp->do_syntax_check[i] == undecided)
+                      tmp->do_syntax_check[i] = no;
                   /* Don't fill tmp->prev_msgid in this case.  */
                   if (!mp->obsolete)
                     tmp->obsolete = false;
diff --git a/gettext-tools/src/msgl-check.c b/gettext-tools/src/msgl-check.c
index d6f4a3d..30f178d 100644
--- a/gettext-tools/src/msgl-check.c
+++ b/gettext-tools/src/msgl-check.c
@@ -40,6 +40,7 @@
 #include "plural-table.h"
 #include "c-strstr.h"
 #include "message.h"
+#include "quote.h"
 #include "gettext.h"
 
 #define _(str) gettext (str)
@@ -912,3 +913,146 @@ check_message_list (message_list_ty *mlp,
 
   return seen_errors;
 }
+
+
+static int
+syntax_check_ellipsis_unicode (const message_ty *mp, const char *msgid)
+{
+  const char *cp;
+  int seen_errors = 0;
+
+  for (cp = msgid; *cp != '\0'; cp++)
+    {
+      cp = strchrnul (cp, '\n');
+      if (cp > msgid + 3 && memcmp (cp - 3, "...", 3) == 0)
+        {
+          po_xerror (PO_SEVERITY_ERROR, mp, NULL, 0, 0, false,
+                     _("ASCII ellipsis ('...') instead of Unicode"));
+          seen_errors++;
+        }
+    }
+
+  return seen_errors;
+}
+
+
+static int
+syntax_check_space_ellipsis (const message_ty *mp, const char *msgid)
+{
+  /* Coincidentally the lengths of bytes are same for UTF-8 and ASCII
+     ellipsis.  */
+  const char *ellipsis
+    = mp->do_syntax_check[sc_ellipsis_unicode] == yes ? "\xE2\x80\xA6" : "...";
+  const char *cp;
+  int seen_errors = 0;
+
+  for (cp = msgid; *cp != '\0'; cp++)
+    {
+      cp = strchrnul (cp, '\n');
+      if (cp > msgid + 4 && memcmp (cp - 3, ellipsis, 3) == 0
+          && c_isspace (*(cp - 4)))
+        {
+          po_xerror (PO_SEVERITY_ERROR, mp, NULL, 0, 0, false,
+                     _("space before ellipsis found in user visible strings"));
+          seen_errors++;
+        }
+    }
+
+  return seen_errors;
+}
+
+
+struct callback_arg
+{
+  const message_ty *mp;
+  int seen_errors;
+};
+
+static void
+syntax_check_quote_unicode_callback (char quote, const char *quoted,
+                                     size_t quoted_length, void *data)
+{
+  struct callback_arg *arg = data;
+
+  switch (quote)
+    {
+    case '"':
+      po_xerror (PO_SEVERITY_ERROR, arg->mp, NULL, 0, 0, false,
+                 _("ASCII double quote used instead of Unicode"));
+      arg->seen_errors++;
+      break;
+
+    case '\'':
+      po_xerror (PO_SEVERITY_ERROR, arg->mp, NULL, 0, 0, false,
+                 _("ASCII single quote used instead of Unicode"));
+      arg->seen_errors++;
+      break;
+
+    default:
+      break;
+    }
+}
+
+static int
+syntax_check_quote_unicode (const message_ty *mp, const char *msgid)
+{
+  struct callback_arg arg;
+
+  arg.mp = mp;
+  arg.seen_errors = 0;
+
+  scan_quoted (msgid, strlen (msgid),
+               syntax_check_quote_unicode_callback, &arg);
+
+  return arg.seen_errors;
+}
+
+
+typedef int (* syntax_check_function) (const message_ty *mp, const char 
*msgid);
+static const syntax_check_function sc_funcs[NSYNTAXCHECKS] =
+{
+  syntax_check_ellipsis_unicode,
+  syntax_check_space_ellipsis,
+  syntax_check_quote_unicode
+};
+
+/* Perform all syntax checks on a non-obsolete message.
+   Return the number of errors that were seen.  */
+static int
+syntax_check_message (const message_ty *mp)
+{
+  int seen_errors = 0;
+  int i;
+
+  for (i = 0; i < NSYNTAXCHECKS; i++)
+    {
+      if (mp->do_syntax_check[i] == yes)
+        {
+          seen_errors += sc_funcs[i] (mp, mp->msgid);
+          if (mp->msgid_plural)
+            seen_errors += sc_funcs[i] (mp, mp->msgid_plural);
+        }
+    }
+
+  return seen_errors;
+}
+
+
+/* Perform all syntax checks on a message list.
+   Return the number of errors that were seen.  */
+int
+syntax_check_message_list (message_list_ty *mlp)
+{
+  int seen_errors = 0;
+  size_t j;
+
+  for (j = 0; j < mlp->nitems; j++)
+    {
+      message_ty *mp = mlp->item[j];
+
+      if (!is_header (mp))
+        seen_errors += syntax_check_message (mp);
+    }
+
+  return seen_errors;
+}
diff --git a/gettext-tools/src/msgl-check.h b/gettext-tools/src/msgl-check.h
index f03300c..f9d9abd 100644
--- a/gettext-tools/src/msgl-check.h
+++ b/gettext-tools/src/msgl-check.h
@@ -28,7 +28,6 @@
 extern "C" {
 #endif
 
-
 /* Check the values returned by plural_eval.
    Signals the errors through po_xerror.
    Return the number of errors that were seen.
@@ -60,6 +59,9 @@ extern int check_message_list (message_list_ty *mlp,
                                int check_compatibility,
                                int check_accelerators, char accelerator_char);
 
+/* Perform all syntax checks on a message list.
+   Return the number of errors that were seen.  */
+extern int syntax_check_message_list (message_list_ty *mlp);
 
 #ifdef __cplusplus
 }
diff --git a/gettext-tools/src/msgmerge.c b/gettext-tools/src/msgmerge.c
index 0415b2a..71d8962 100644
--- a/gettext-tools/src/msgmerge.c
+++ b/gettext-tools/src/msgmerge.c
@@ -1330,6 +1330,9 @@ message_merge (message_ty *def, message_ty *ref, bool 
force_fuzzy,
 
   result->do_wrap = ref->do_wrap;
 
+  for (i = 0; i < NSYNTAXCHECKS; i++)
+    result->do_syntax_check[i] = ref->do_syntax_check[i];
+
   /* Insert previous msgid, commented out with "#|".
      Do so only when --previous is specified, for backward compatibility.
      Since the "previous msgid" represents the original msgid that led to
diff --git a/gettext-tools/src/read-catalog-abstract.c 
b/gettext-tools/src/read-catalog-abstract.c
index d4e98ee..0817cd7 100644
--- a/gettext-tools/src/read-catalog-abstract.c
+++ b/gettext-tools/src/read-catalog-abstract.c
@@ -262,7 +262,8 @@ po_callback_comment_special (const char *s)
 void
 po_parse_comment_special (const char *s,
                           bool *fuzzyp, enum is_format formatp[NFORMATS],
-                          struct argument_range *rangep, enum is_wrap *wrapp)
+                          struct argument_range *rangep, enum is_wrap *wrapp,
+                          enum is_syntax_check scp[NSYNTAXCHECKS])
 {
   size_t i;
 
@@ -272,6 +273,8 @@ po_parse_comment_special (const char *s,
   rangep->min = -1;
   rangep->max = -1;
   *wrapp = undecided;
+  for (i = 0; i < NSYNTAXCHECKS; i++)
+    scp[i] = undecided;
 
   while (*s != '\0')
     {
@@ -405,6 +408,36 @@ po_parse_comment_special (const char *s,
               continue;
             }
 
+          /* Accept syntax check description.  */
+          if (len >= 6 && memcmp (t + len - 6, "-check", 6) == 0)
+            {
+              const char *p;
+              size_t n;
+              enum is_syntax_check value;
+
+              p = t;
+              n = len - 6;
+
+              if (n >= 3 && memcmp (p, "no-", 3) == 0)
+                {
+                  p += 3;
+                  n -= 3;
+                  value = no;
+                }
+              else
+                value = yes;
+
+              for (i = 0; i < NSYNTAXCHECKS; i++)
+                if (strlen (syntax_check_name[i]) == n
+                    && memcmp (syntax_check_name[i], p, n) == 0)
+                  {
+                    scp[i] = value;
+                    break;
+                  }
+              if (i < NSYNTAXCHECKS)
+                continue;
+            }
+
           /* Unknown special comment marker.  It may have been generated
              from a future xgettext version.  Ignore it.  */
         }
diff --git a/gettext-tools/src/read-catalog-abstract.h 
b/gettext-tools/src/read-catalog-abstract.h
index c3fc84f..367584b 100644
--- a/gettext-tools/src/read-catalog-abstract.h
+++ b/gettext-tools/src/read-catalog-abstract.h
@@ -184,7 +184,8 @@ extern void po_callback_comment_dispatcher (const char *s);
 extern void po_parse_comment_special (const char *s, bool *fuzzyp,
                                       enum is_format formatp[NFORMATS],
                                       struct argument_range *rangep,
-                                      enum is_wrap *wrapp);
+                                      enum is_wrap *wrapp,
+                                      enum is_syntax_check scp[NSYNTAXCHECKS]);
 
 
 #ifdef __cplusplus
diff --git a/gettext-tools/src/read-catalog.c b/gettext-tools/src/read-catalog.c
index 4642249..8c77df1 100644
--- a/gettext-tools/src/read-catalog.c
+++ b/gettext-tools/src/read-catalog.c
@@ -105,6 +105,8 @@ default_constructor (abstract_catalog_reader_ty *that)
   this->range.min = -1;
   this->range.max = -1;
   this->do_wrap = undecided;
+  for (i = 0; i < NSYNTAXCHECKS; i++)
+    this->do_syntax_check[i] = undecided;
 }
 
 
@@ -172,6 +174,8 @@ default_copy_comment_state (default_catalog_reader_ty 
*this, message_ty *mp)
     mp->is_format[i] = this->is_format[i];
   mp->range = this->range;
   mp->do_wrap = this->do_wrap;
+  for (i = 0; i < NSYNTAXCHECKS; i++)
+    mp->do_syntax_check[i] = this->do_syntax_check[i];
 }
 
 
@@ -205,6 +209,8 @@ default_reset_comment_state (default_catalog_reader_ty 
*this)
   this->range.min = -1;
   this->range.max = -1;
   this->do_wrap = undecided;
+  for (i = 0; i < NSYNTAXCHECKS; i++)
+    this->do_syntax_check[i] = undecided;
 }
 
 
@@ -299,7 +305,7 @@ default_comment_special (abstract_catalog_reader_ty *that, 
const char *s)
   default_catalog_reader_ty *this = (default_catalog_reader_ty *) that;
 
   po_parse_comment_special (s, &this->is_fuzzy, this->is_format, &this->range,
-                            &this->do_wrap);
+                            &this->do_wrap, this->do_syntax_check);
 }
 
 
diff --git a/gettext-tools/src/read-catalog.h b/gettext-tools/src/read-catalog.h
index f567d78..74e0fd7 100644
--- a/gettext-tools/src/read-catalog.h
+++ b/gettext-tools/src/read-catalog.h
@@ -113,6 +113,7 @@ struct default_catalog_reader_class_ty
   enum is_format is_format[NFORMATS];                                   \
   struct argument_range range;                                          \
   enum is_wrap do_wrap;                                                 \
+  enum is_syntax_check do_syntax_check[NSYNTAXCHECKS];                  \
 
 typedef struct default_catalog_reader_ty default_catalog_reader_ty;
 struct default_catalog_reader_ty
diff --git a/gettext-tools/src/xgettext.c b/gettext-tools/src/xgettext.c
index f9156eb..12b3f54 100644
--- a/gettext-tools/src/xgettext.c
+++ b/gettext-tools/src/xgettext.c
@@ -58,6 +58,8 @@
 #include "po-charset.h"
 #include "msgl-iconv.h"
 #include "msgl-ascii.h"
+#include "msgl-check.h"
+#include "po-xerror.h"
 #include "po-time.h"
 #include "write-catalog.h"
 #include "write-po.h"
@@ -179,6 +181,9 @@ static bool recognize_format_kde;
 /* If true, recognize Boost format strings.  */
 static bool recognize_format_boost;
 
+/* Syntax checks enabled by default.  */
+static enum is_syntax_check default_syntax_check[NSYNTAXCHECKS];
+
 /* Canonicalized encoding name for all input files.  */
 const char *xgettext_global_source_encoding;
 
@@ -204,6 +209,7 @@ static const struct option long_options[] =
   { "add-location", optional_argument, NULL, 'n' },
   { "boost", no_argument, NULL, CHAR_MAX + 11 },
   { "c++", no_argument, NULL, 'C' },
+  { "check", required_argument, NULL, 'W' },
   { "color", optional_argument, NULL, CHAR_MAX + 14 },
   { "copyright-holder", required_argument, NULL, CHAR_MAX + 1 },
   { "debug", no_argument, &do_debug, 1 },
@@ -346,7 +352,7 @@ main (int argc, char *argv[])
   init_flag_table_vala ();
 
   while ((optchar = getopt_long (argc, argv,
-                                 "ac::Cd:D:eEf:Fhijk::l:L:m::M::no:p:sTVw:x:",
+                                 
"ac::Cd:D:eEf:Fhijk::l:L:m::M::no:p:sTVw:W:x:",
                                  long_options, NULL)) != EOF)
     switch (optchar)
       {
@@ -525,6 +531,17 @@ main (int argc, char *argv[])
         }
         break;
 
+      case 'W':
+        if (strcmp (optarg, "ellipsis-unicode") == 0)
+          default_syntax_check[sc_ellipsis_unicode] = yes;
+        else if (strcmp (optarg, "space-ellipsis") == 0)
+          default_syntax_check[sc_space_ellipsis] = yes;
+        else if (strcmp (optarg, "quote-unicode") == 0)
+          default_syntax_check[sc_quote_unicode] = yes;
+        else
+          error (EXIT_FAILURE, 0, _("syntax check '%s' unknown"), optarg);
+        break;
+
       case 'x':
         read_exclusion_file (optarg);
         break;
@@ -836,6 +853,24 @@ warning: file '%s' extension '%s' is unknown; will try 
C"), filename, extension)
   else if (sort_by_msgid)
     msgdomain_list_sort_by_msgid (mdlp);
 
+  /* Check syntax of messages.  */
+  {
+    int nerrors = 0;
+
+    for (i = 0; i < mdlp->nitems; i++)
+      {
+        message_list_ty *mlp = mdlp->item[i]->messages;
+        nerrors = syntax_check_message_list (mlp);
+      }
+
+    /* Exit with status 1 on any error.  */
+    if (nerrors > 0)
+      error (EXIT_FAILURE, 0,
+             ngettext ("found %d fatal error", "found %d fatal errors",
+                       nerrors),
+             nerrors);
+  }
+
   /* Write the PO file.  */
   msgdomain_list_print (mdlp, file_name, output_syntax, force_po, do_debug);
 
@@ -921,6 +956,10 @@ Operation mode:\n"));
                                 preceding keyword lines in output file\n\
   -c, --add-comments          place all comment blocks preceding keyword 
lines\n\
                                 in output file\n"));
+      printf (_("\
+  -W, --check=NAME            perform syntax check on messages\n\
+                                (ellipsis-unicode, space-ellipsis,\n\
+                                 quote-unicode)\n"));
       printf ("\n");
       printf (_("\
 Language specific options:\n"));
@@ -1644,8 +1683,8 @@ xgettext_record_flag (const char *optionstring)
           flag += 5;
         }
 
-      /* Unlike po_parse_comment_special(), we don't accept "fuzzy" or "wrap"
-         here - it has no sense.  */
+      /* Unlike po_parse_comment_special(), we don't accept "fuzzy",
+         "wrap", or "check" here - it has no sense.  */
       if (strlen (flag) >= 7
           && memcmp (flag + strlen (flag) - 7, "-format", 7) == 0)
         {
@@ -2238,6 +2277,7 @@ remember_a_message (message_list_ty *mlp, char *msgctxt, 
char *msgid,
   enum is_format is_format[NFORMATS];
   struct argument_range range;
   enum is_wrap do_wrap;
+  enum is_syntax_check do_syntax_check[NSYNTAXCHECKS];
   message_ty *mp;
   char *msgstr;
   size_t i;
@@ -2264,6 +2304,8 @@ remember_a_message (message_list_ty *mlp, char *msgctxt, 
char *msgid,
   range.min = -1;
   range.max = -1;
   do_wrap = undecided;
+  for (i = 0; i < NSYNTAXCHECKS; i++)
+    do_syntax_check[i] = undecided;
 
   if (msgctxt != NULL)
     CONVERT_STRING (msgctxt, lc_string);
@@ -2297,6 +2339,8 @@ meta information, not the empty string.\n")));
       for (i = 0; i < NFORMATS; i++)
         is_format[i] = mp->is_format[i];
       do_wrap = mp->do_wrap;
+      for (i = 0; i < NSYNTAXCHECKS; i++)
+        do_syntax_check[i] = mp->do_syntax_check[i];
     }
   else
     {
@@ -2376,12 +2420,13 @@ meta information, not the empty string.\n")));
             enum is_format tmp_format[NFORMATS];
             struct argument_range tmp_range;
             enum is_wrap tmp_wrap;
+            enum is_syntax_check tmp_syntax_check[NSYNTAXCHECKS];
             bool interesting;
 
             t += strlen ("xgettext:");
 
             po_parse_comment_special (t, &tmp_fuzzy, tmp_format, &tmp_range,
-                                      &tmp_wrap);
+                                      &tmp_wrap, tmp_syntax_check);
 
             interesting = false;
             for (i = 0; i < NFORMATS; i++)
@@ -2400,6 +2445,12 @@ meta information, not the empty string.\n")));
                 do_wrap = tmp_wrap;
                 interesting = true;
               }
+            for (i = 0; i < NSYNTAXCHECKS; i++)
+              if (tmp_syntax_check[i] != undecided)
+                {
+                  do_syntax_check[i] = tmp_syntax_check[i];
+                  interesting = true;
+                }
 
             /* If the "xgettext:" marker was followed by an interesting
                keyword, and we updated our is_format/do_wrap variables,
@@ -2525,6 +2576,14 @@ meta information, not the empty string.\n")));
 
   mp->do_wrap = do_wrap == no ? no : yes;       /* By default we wrap.  */
 
+  for (i = 0; i < NSYNTAXCHECKS; i++)
+    {
+      if (do_syntax_check[i] == undecided)
+        do_syntax_check[i] = default_syntax_check[i] == yes ? yes : no;
+
+      mp->do_syntax_check[i] = do_syntax_check[i];
+    }
+
   /* Warn about the use of non-reorderable format strings when the programming
      language also provides reorderable format strings.  */
   warn_format_string (is_format, mp->msgid, pos, "msgid");
diff --git a/gettext-tools/tests/ChangeLog b/gettext-tools/tests/ChangeLog
index eec1586..9223edd 100644
--- a/gettext-tools/tests/ChangeLog
+++ b/gettext-tools/tests/ChangeLog
@@ -1,3 +1,8 @@
+2015-02-04  Daiki Ueno  <address@hidden>
+
+       * xgettext-13: New file.
+       * Makefile.am (TESTS): Add new test.
+
 2015-01-29  Daiki Ueno  <address@hidden>
 
        * msgexec-6: New file.
diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am
index ee34655..32bc192 100644
--- a/gettext-tools/tests/Makefile.am
+++ b/gettext-tools/tests/Makefile.am
@@ -72,6 +72,7 @@ TESTS = gettext-1 gettext-2 gettext-3 gettext-4 gettext-5 
gettext-6 gettext-7 \
        recode-sr-latin-1 recode-sr-latin-2 \
        xgettext-2 xgettext-3 xgettext-4 xgettext-5 xgettext-6 \
        xgettext-7 xgettext-8 xgettext-9 xgettext-10 xgettext-11 xgettext-12 \
+       xgettext-13 \
        xgettext-awk-1 xgettext-awk-2 \
        xgettext-c-2 xgettext-c-3 xgettext-c-4 xgettext-c-5 \
        xgettext-c-6 xgettext-c-7 xgettext-c-8 xgettext-c-9 xgettext-c-10 \
diff --git a/gettext-tools/tests/xgettext-13 b/gettext-tools/tests/xgettext-13
new file mode 100755
index 0000000..32107f2
--- /dev/null
+++ b/gettext-tools/tests/xgettext-13
@@ -0,0 +1,99 @@
+#!/bin/sh
+. "${srcdir=.}/init.sh"; path_prepend_ . ../src
+
+# Test for --check option.
+
+# --check=ellipsis-unicode
+cat <<\EOF > xg-ellipsis-u.c
+gettext ("this is a sentence...");
+
+ngettext ("this is a sentence", "these are sentences...", 2);
+
+/* xgettext: no-ellipsis-unicode-check */
+gettext ("this is another sentence...");
+
+gettext ("this is a multiline sentence\n"
+         "and the second line...\n"
+        "ends with an ellipsis\n");
+EOF
+
+: ${XGETTEXT=xgettext}
+LANGUAGE= LC_ALL=C ${XGETTEXT} --omit-header --add-comments 
--check=ellipsis-unicode -d xg-ellipsis-u.tmp xg-ellipsis-u.c 
2>xg-ellipsis-u.err
+
+test `grep -c 'ASCII ellipsis' xg-ellipsis-u.err` = 3 || exit 1
+
+# --check=space-ellipsis
+cat <<\EOF > xg-space-e.c
+gettext ("this is a sentence ...");
+
+/* xgettext: no-space-ellipsis-check, no-ellipsis-unicode-check */
+gettext ("this is another sentence ...");
+
+gettext ("this is a multiline sentence\n"
+         "and the second line ...\n"
+        "ends with an ellipsis\n");
+EOF
+
+LANGUAGE= LC_ALL=C ${XGETTEXT} --omit-header --add-comments 
--check=space-ellipsis -d xg-space-e.tmp xg-space-e.c 2>xg-space-e.err
+
+test `grep -c 'space before ellipsis' xg-space-e.err` = 2 || exit 1
+
+# Combination of --check=space-ellipsis and --check=ellipsis-unicode.
+LANGUAGE= LC_ALL=C ${XGETTEXT} --omit-header --add-comments 
--check=ellipsis-unicode --check=space-ellipsis -d xg-space-eu.tmp xg-space-e.c 
2>xg-space-eu.err
+
+test `grep -c 'ASCII ellipsis' xg-space-eu.err` = 2 || exit 1
+
+# --check=quote-unicode
+cat <<\EOF > xg-quote-u.c
+gettext ("\"double quoted\"");
+
+/* xgettext: no-quote-unicode-check */
+gettext ("\"double quoted but ignored\"");
+
+gettext ("double quoted but empty \"\"");
+
+gettext ("\"\" double quoted but empty");
+
+gettext ("\"foo\" \"bar\" \"baz\"");
+
+gettext ("'single quoted'");
+
+/* xgettext: no-quote-unicode-check */
+gettext ("'single quoted but ignored'");
+
+gettext ("'foo' 'bar' 'baz'");
+
+gettext ("prefix'single quoted without surrounding spaces'suffix");
+
+gettext ("prefix 'single quoted with surrounding spaces' suffix");
+
+gettext ("single quoted with apostrophe, empty '' ");
+
+gettext ("'single quoted at the beginning of string' ");
+
+gettext (" 'single quoted at the end of string'");
+
+gettext ("line 1\n"
+"'single quoted at the beginning of line' \n"
+"line 3");
+
+gettext ("line 1\n"
+" 'single quoted at the end of line'\n"
+"line 3");
+
+gettext ("`single quoted with grave'");
+
+/* xgettext: no-quote-unicode-check */
+gettext ("`single quoted with grave but ignored'");
+
+gettext ("single quoted with grave, empty `'");
+
+gettext ("`' single quoted with grave, empty");
+
+gettext ("`double grave`");
+EOF
+
+LANGUAGE= LC_ALL=C ${XGETTEXT} --omit-header --add-comments 
--check=quote-unicode -d xg-quote-u.tmp xg-quote-u.c 2>xg-quote-u.err
+
+test `grep -c 'ASCII double quote' xg-quote-u.err` = 4 || exit 1
+test `grep -c 'ASCII single quote' xg-quote-u.err` = 12 || exit 1
-- 
2.1.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]