bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gettext] broken handling of unicode code point escapes in Tcl


From: Daiki Ueno
Subject: Re: [bug-gettext] broken handling of unicode code point escapes in Tcl
Date: Tue, 25 Jun 2013 12:58:16 +0900
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux)

Hi Guido,

Guido Berhoerster <address@hidden> writes:

> xgettext parsing of Tcl unicode code point escapes is broken, it tries
> to replace the escape with the literal unicode character but does not
> consume the last character of the escape but copies it into the output
> which results in corrupt .po files, e.g.:
>
> $ cat gettext-bug.tcl
> puts [msgcat::mc "Hello\u200e\u201cWorld\u201d"]
>
> $ /usr/bin/xgettext -o- gettext-bug.tcl
> #: gettext-bug.tcl:5
> msgid "Hello‎e“cWorld”d"
> msgstr ""

Thanks for the report.

> It should probably not try to substitute these escapes at all as it
> results in fragile .po files with embedded control characters, see
> e.g. the U+200E left-to-right mark in the above example.

I've just pushed the attached patch (\x fix in the patch is not really
necessay, sorry; partially reverted in the git).

Regards,
-- 
Daiki Ueno
>From 1a41636f0bf3559666b3ba16fbc1c8ad28dc7a9a Mon Sep 17 00:00:00 2001
From: Daiki Ueno <address@hidden>
Date: Tue, 25 Jun 2013 12:24:47 +0900
Subject: [PATCH] Fix handling of \x and \u escape sequences in Tcl.

---
 gettext-tools/src/ChangeLog        |  7 +++++
 gettext-tools/src/x-tcl.c          | 12 +++++---
 gettext-tools/tests/ChangeLog      |  6 ++++
 gettext-tools/tests/Makefile.am    |  2 +-
 gettext-tools/tests/xgettext-tcl-4 | 59 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 81 insertions(+), 5 deletions(-)
 create mode 100644 gettext-tools/tests/xgettext-tcl-4

diff --git a/gettext-tools/src/ChangeLog b/gettext-tools/src/ChangeLog
index ce6b6c8..4b0b4e4 100644
--- a/gettext-tools/src/ChangeLog
+++ b/gettext-tools/src/ChangeLog
@@ -1,3 +1,10 @@
+2013-06-25  Daiki Ueno  <address@hidden>
+
+       Fix handling of \x and \u escape sequences in Tcl.
+       * x-tcl.c (do_getc_escaped): Fix handling of \x and \u.
+       Reported by Guido Berhoerster in
+       <https://lists.gnu.org/archive/html/bug-gettext/2013-06/msg00022.html>.
+
 2013-06-17  Daiki Ueno  <address@hidden>
 
        * x-python.c (init_flag_table_python): Enable python-brace-format
diff --git a/gettext-tools/src/x-tcl.c b/gettext-tools/src/x-tcl.c
index 82bf19d..2b57e3f 100644
--- a/gettext-tools/src/x-tcl.c
+++ b/gettext-tools/src/x-tcl.c
@@ -496,7 +496,10 @@ do_getc_escaped ()
           {
             c = phase1_getc ();
             if (c == EOF || !c_isxdigit ((unsigned char) c))
-              break;
+              {
+                phase1_ungetc (c);
+                break;
+              }
 
             if (c >= '0' && c <= '9')
               n = (n << 4) + (c - '0');
@@ -505,7 +508,6 @@ do_getc_escaped ()
             else if (c >= 'a' && c <= 'f')
               n = (n << 4) + (c - 'a' + 10);
           }
-        phase1_ungetc (c);
         return (i > 0 ? (unsigned char) n : 'x');
       }
     case 'u':
@@ -517,7 +519,10 @@ do_getc_escaped ()
           {
             c = phase1_getc ();
             if (c == EOF || !c_isxdigit ((unsigned char) c))
-              break;
+              {
+                phase1_ungetc (c);
+                break;
+              }
 
             if (c >= '0' && c <= '9')
               n = (n << 4) + (c - '0');
@@ -526,7 +531,6 @@ do_getc_escaped ()
             else if (c >= 'a' && c <= 'f')
               n = (n << 4) + (c - 'a' + 10);
           }
-        phase1_ungetc (c);
         return (i > 0 ? n : 'u');
       }
     case '0': case '1': case '2': case '3': case '4':
diff --git a/gettext-tools/tests/ChangeLog b/gettext-tools/tests/ChangeLog
index 2ed5f27..f036e07 100644
--- a/gettext-tools/tests/ChangeLog
+++ b/gettext-tools/tests/ChangeLog
@@ -1,3 +1,9 @@
+2013-06-25  Daiki Ueno  <address@hidden>
+
+       Fix handling of \x and \u escape sequences in Tcl.
+       * Makefile.am (TESTS): Add xgettext-tcl-4.
+       * xgettext-tcl-4: New test for escape sequences.
+
 2013-06-17  Daiki Ueno  <address@hidden>
 
        * format-python-brace-1: No need to pass
diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am
index 86263b5..37e7bbc 100644
--- a/gettext-tools/tests/Makefile.am
+++ b/gettext-tools/tests/Makefile.am
@@ -98,7 +98,7 @@ TESTS = gettext-1 gettext-2 gettext-3 gettext-4 gettext-5 
gettext-6 gettext-7 \
        xgettext-sh-6 \
        xgettext-smalltalk-1 xgettext-smalltalk-2 \
        xgettext-stringtable-1 \
-       xgettext-tcl-1 xgettext-tcl-2 xgettext-tcl-3 \
+       xgettext-tcl-1 xgettext-tcl-2 xgettext-tcl-3 xgettext-tcl-4 \
        xgettext-ycp-1 xgettext-ycp-2 xgettext-ycp-3 xgettext-ycp-4 \
        xgettext-lua-1 xgettext-lua-2 \
        xgettext-javascript-1 xgettext-javascript-2 xgettext-javascript-3 \
diff --git a/gettext-tools/tests/xgettext-tcl-4 
b/gettext-tools/tests/xgettext-tcl-4
new file mode 100644
index 0000000..7893ccb
--- /dev/null
+++ b/gettext-tools/tests/xgettext-tcl-4
@@ -0,0 +1,59 @@
+#!/bin/sh
+
+# Test of Tcl support: escape sequences.
+
+tmpfiles=""
+trap 'rm -fr $tmpfiles' 1 2 3 15
+
+tmpfiles="$tmpfiles xg-t-4.tcl"
+cat <<\EOF > xg-t-4.tcl
+puts [_ "Hello\u200e\u201cWorld\u201d"]
+puts [_ "x\u20y\x20z"]
+puts [_ "\xFF20"]
+EOF
+
+tmpfiles="$tmpfiles xg-t-4.err xg-t-4.tmp xg-t-4.pot"
+: ${XGETTEXT=xgettext}
+${XGETTEXT} --add-comments --no-location -k_ -o xg-t-4.tmp xg-t-4.tcl 
2>xg-t-4.err
+test $? = 0 || { cat xg-t-4.err; rm -fr $tmpfiles; exit 1; }
+# Don't simplify this to "grep ... < xg-t-4.tmp", otherwise OpenBSD 4.0 grep
+# only outputs "Binary file (standard input) matches".
+cat xg-t-4.tmp | grep -v 'POT-Creation-Date' | LC_ALL=C tr -d '\r' > xg-t-4.pot
+
+tmpfiles="$tmpfiles xg-t-4.ok"
+cat <<\EOF > xg-t-4.ok
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
+# This file is distributed under the same license as the PACKAGE package.
+# FIRST AUTHOR <address@hidden>, YEAR.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: PACKAGE VERSION\n"
+"Report-Msgid-Bugs-To: \n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <address@hidden>\n"
+"Language-Team: LANGUAGE <address@hidden>\n"
+"Language: \n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=UTF-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+
+msgid "Hello‎“World”"
+msgstr ""
+
+msgid "x y z"
+msgstr ""
+
+msgid " "
+msgstr ""
+EOF
+
+: ${DIFF=diff}
+${DIFF} xg-t-4.ok xg-t-4.pot
+result=$?
+
+rm -fr $tmpfiles
+
+exit $result
-- 
1.8.2.1


reply via email to

[Prev in Thread] Current Thread [Next in Thread]