bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] A bug in u-strtok.h and the fix in libunistring-0


From: Daiki Ueno
Subject: Re: [bug-libunistring] A bug in u-strtok.h and the fix in libunistring-0.9.5
Date: Fri, 03 Jul 2015 11:58:45 +0900
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

Hello,

Seiya Kawashima <address@hidden> writes:

> During my experiments with libunistring-0.9.5, I've found an error in 
> u-strtok.h as below. The lines
> starting with "////" are my changes.
>
> /* Move past the token. */
> {
> UNIT *token_end = U_STRPBRK (str, delim);
> if (token_end)
> {
> /* NUL-terminate the token. */
> *token_end = 0;
> *ptr = token_end + 1;
> //// These lines should be something like below.
> //// *ptr = token_end + (sizeof(uint8_t) * u8_strmblen(token_end)); 
> //// *token_end = 0;
> }
> else
> *ptr = NULL;
> }
>
> So the original code tries to start the next search without checking how many 
> bytes are actually
> taken by a matched delimiter but assuming 1 by "token_end + 1". When the 
> delimiter takes more
> than one UNIT such as a delimiter in Japanese, this assumption fails and 
> starts the next search from
> an invalid location which is in the middle of a Unicode character.

Indeed, thanks for the report.

> To solve the issue, one can define U_STRMBLEN with u8_strmblen,u16_strmblen 
> and u32_strmblen
> accordingly and call it like *ptr = token_end + (sizeof(UNIT) * 
> U_STRMBLEN(token_end)) instead of
> *ptr = token_end + 1.

Yes, I've pushed a slightly modified fix (attached) in your name, along
with test cases.

Regards,
-- 
Daiki Ueno
>From 2c5543396b7c208b34fa714430801945aefb1fe8 Mon Sep 17 00:00:00 2001
From: Seiya Kawashima <address@hidden>
Date: Fri, 3 Jul 2015 11:42:43 +0900
Subject: [PATCH] unistr/uN-strtok: handle multibyte delimiters

Previously, uN_strtok moved PTR to the next unit to the token end.
When DELIM contained a multibyte character, the new position could
be a middle of a multibyte character.
* lib/unistr/u-strtok.h (FUNC): Place PTR at the next character
after the token.
* lib/unistr/u8-strtok.c (U_STRMBLEN): New macro.
* lib/unistr/u16-strtok.c (U_STRMBLEN): New macro.
* lib/unistr/u32-strtok.c (U_STRMBLEN): New macro.
* modules/unistr/u8-strtok (Depends-on): Depend on
unistr/u8-strmblen.
* modules/unistr/u16-strtok (Depends-on): Depend on
unistr/u16-strmblen.
* modules/unistr/u32-strtok (Depends-on): Depend on
unistr/u32-strmblen.
* tests/unistr/test-u-strtok.h: New file.
* tests/unistr/test-u8-strtok.c: New file.
* tests/unistr/test-u16-strtok.c: New file.
* tests/unistr/test-u32-strtok.c: New file.
* modules/unistr/u8-strtok-tests: New file.
* modules/unistr/u32-strtok-tests: New file.
* modules/unistr/u16-strtok-tests: New file.
Copyright-paperwork-exempt: yes
Co-authored-by: Daiki Ueno <address@hidden>
---
 ChangeLog                       | 26 +++++++++++++
 lib/unistr/u-strtok.h           |  2 +-
 lib/unistr/u16-strtok.c         |  1 +
 lib/unistr/u32-strtok.c         |  1 +
 lib/unistr/u8-strtok.c          |  1 +
 modules/unistr/u16-strtok       |  1 +
 modules/unistr/u16-strtok-tests | 14 +++++++
 modules/unistr/u32-strtok       |  1 +
 modules/unistr/u32-strtok-tests | 15 +++++++
 modules/unistr/u8-strtok        |  1 +
 modules/unistr/u8-strtok-tests  | 14 +++++++
 tests/unistr/test-u-strtok.h    | 86 +++++++++++++++++++++++++++++++++++++++++
 tests/unistr/test-u16-strtok.c  | 37 ++++++++++++++++++
 tests/unistr/test-u32-strtok.c  | 37 ++++++++++++++++++
 tests/unistr/test-u8-strtok.c   | 37 ++++++++++++++++++
 15 files changed, 273 insertions(+), 1 deletion(-)
 create mode 100644 modules/unistr/u16-strtok-tests
 create mode 100644 modules/unistr/u32-strtok-tests
 create mode 100644 modules/unistr/u8-strtok-tests
 create mode 100644 tests/unistr/test-u-strtok.h
 create mode 100644 tests/unistr/test-u16-strtok.c
 create mode 100644 tests/unistr/test-u32-strtok.c
 create mode 100644 tests/unistr/test-u8-strtok.c

diff --git a/ChangeLog b/ChangeLog
index 8893c75..373583f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,29 @@
+2015-07-03  Seiya Kawashima <address@hidden>  (tiny-change)
+       and Daiki Ueno  <address@hidden>
+
+       unistr/uN-strtok: handle multibyte delimiters
+       Previously, uN_strtok moved PTR to the next unit to the token end.
+       When DELIM contained a multibyte character, the new position could
+       be a middle of a multibyte character.
+       * lib/unistr/u-strtok.h (FUNC): Place PTR at the next character
+       after the token.
+       * lib/unistr/u8-strtok.c (U_STRMBLEN): New macro.
+       * lib/unistr/u16-strtok.c (U_STRMBLEN): New macro.
+       * lib/unistr/u32-strtok.c (U_STRMBLEN): New macro.
+       * modules/unistr/u8-strtok (Depends-on): Depend on
+       unistr/u8-strmblen.
+       * modules/unistr/u16-strtok (Depends-on): Depend on
+       unistr/u16-strmblen.
+       * modules/unistr/u32-strtok (Depends-on): Depend on
+       unistr/u32-strmblen.
+       * tests/unistr/test-u-strtok.h: New file.
+       * tests/unistr/test-u8-strtok.c: New file.
+       * tests/unistr/test-u16-strtok.c: New file.
+       * tests/unistr/test-u32-strtok.c: New file.
+       * modules/unistr/u8-strtok-tests: New file.
+       * modules/unistr/u32-strtok-tests: New file.
+       * modules/unistr/u16-strtok-tests: New file.
+
 2015-07-02  Friedrich Haubensak  <address@hidden>
 
        update-copyright: fix test failure with perl >= 5.22 (trivial)
diff --git a/lib/unistr/u-strtok.h b/lib/unistr/u-strtok.h
index 4cd6830..edafa1b 100644
--- a/lib/unistr/u-strtok.h
+++ b/lib/unistr/u-strtok.h
@@ -40,9 +40,9 @@ FUNC (UNIT *str, const UNIT *delim, UNIT **ptr)
     UNIT *token_end = U_STRPBRK (str, delim);
     if (token_end)
       {
+        *ptr = token_end + U_STRMBLEN (token_end);
         /* NUL-terminate the token.  */
         *token_end = 0;
-        *ptr = token_end + 1;
       }
     else
       *ptr = NULL;
diff --git a/lib/unistr/u16-strtok.c b/lib/unistr/u16-strtok.c
index 62e2f49..df36cf7 100644
--- a/lib/unistr/u16-strtok.c
+++ b/lib/unistr/u16-strtok.c
@@ -24,4 +24,5 @@
 #define UNIT uint16_t
 #define U_STRSPN u16_strspn
 #define U_STRPBRK u16_strpbrk
+#define U_STRMBLEN u16_strmblen
 #include "u-strtok.h"
diff --git a/lib/unistr/u32-strtok.c b/lib/unistr/u32-strtok.c
index 0bb16ff..f8ef999 100644
--- a/lib/unistr/u32-strtok.c
+++ b/lib/unistr/u32-strtok.c
@@ -24,4 +24,5 @@
 #define UNIT uint32_t
 #define U_STRSPN u32_strspn
 #define U_STRPBRK u32_strpbrk
+#define U_STRMBLEN u32_strmblen
 #include "u-strtok.h"
diff --git a/lib/unistr/u8-strtok.c b/lib/unistr/u8-strtok.c
index efb6a9a..1e4e6ef 100644
--- a/lib/unistr/u8-strtok.c
+++ b/lib/unistr/u8-strtok.c
@@ -24,4 +24,5 @@
 #define UNIT uint8_t
 #define U_STRSPN u8_strspn
 #define U_STRPBRK u8_strpbrk
+#define U_STRMBLEN u8_strmblen
 #include "u-strtok.h"
diff --git a/modules/unistr/u16-strtok b/modules/unistr/u16-strtok
index 50773f5..98cfcf6 100644
--- a/modules/unistr/u16-strtok
+++ b/modules/unistr/u16-strtok
@@ -9,6 +9,7 @@ Depends-on:
 unistr/base
 unistr/u16-strspn
 unistr/u16-strpbrk
+unistr/u16-strmblen
 
 configure.ac:
 gl_LIBUNISTRING_MODULE([0.9], [unistr/u16-strtok])
diff --git a/modules/unistr/u16-strtok-tests b/modules/unistr/u16-strtok-tests
new file mode 100644
index 0000000..c6b71d8
--- /dev/null
+++ b/modules/unistr/u16-strtok-tests
@@ -0,0 +1,14 @@
+Files:
+tests/unistr/test-u16-strtok.c
+tests/unistr/test-u-strtok.h
+tests/macros.h
+
+Depends-on:
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-u16-strtok
+check_PROGRAMS += test-u16-strtok
+test_u16_strtok_SOURCES = unistr/test-u16-strtok.c
+test_u16_strtok_LDADD = $(LDADD) $(LIBUNISTRING)
diff --git a/modules/unistr/u32-strtok b/modules/unistr/u32-strtok
index 2e19c81..1206248 100644
--- a/modules/unistr/u32-strtok
+++ b/modules/unistr/u32-strtok
@@ -9,6 +9,7 @@ Depends-on:
 unistr/base
 unistr/u32-strspn
 unistr/u32-strpbrk
+unistr/u32-strmblen
 
 configure.ac:
 gl_LIBUNISTRING_MODULE([0.9], [unistr/u32-strtok])
diff --git a/modules/unistr/u32-strtok-tests b/modules/unistr/u32-strtok-tests
new file mode 100644
index 0000000..319e0a9
--- /dev/null
+++ b/modules/unistr/u32-strtok-tests
@@ -0,0 +1,15 @@
+Files:
+tests/unistr/test-u32-strtok.c
+tests/unistr/test-u-strtok.h
+tests/macros.h
+
+Depends-on:
+unistr/u32-uctomb
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-u32-strtok
+check_PROGRAMS += test-u32-strtok
+test_u32_strtok_SOURCES = unistr/test-u32-strtok.c
+test_u32_strtok_LDADD = $(LDADD) $(LIBUNISTRING)
diff --git a/modules/unistr/u8-strtok b/modules/unistr/u8-strtok
index 28d7ed5..657b6fd 100644
--- a/modules/unistr/u8-strtok
+++ b/modules/unistr/u8-strtok
@@ -9,6 +9,7 @@ Depends-on:
 unistr/base
 unistr/u8-strspn
 unistr/u8-strpbrk
+unistr/u8-strmblen
 
 configure.ac:
 gl_LIBUNISTRING_MODULE([0.9], [unistr/u8-strtok])
diff --git a/modules/unistr/u8-strtok-tests b/modules/unistr/u8-strtok-tests
new file mode 100644
index 0000000..f6e2588
--- /dev/null
+++ b/modules/unistr/u8-strtok-tests
@@ -0,0 +1,14 @@
+Files:
+tests/unistr/test-u8-strtok.c
+tests/unistr/test-u-strtok.h
+tests/macros.h
+
+Depends-on:
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-u8-strtok
+check_PROGRAMS += test-u8-strtok
+test_u8_strtok_SOURCES = unistr/test-u8-strtok.c
+test_u8_strtok_LDADD = $(LDADD) $(LIBUNISTRING)
diff --git a/tests/unistr/test-u-strtok.h b/tests/unistr/test-u-strtok.h
new file mode 100644
index 0000000..735f033
--- /dev/null
+++ b/tests/unistr/test-u-strtok.h
@@ -0,0 +1,86 @@
+/* Test of uN_strtok() functions.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+static void
+test_u_strtok (void)
+{
+  {
+    UNIT input[] = { 'f', 'o', 'o', 0 };
+    const UNIT delim[] = { 0 };
+    UNIT *state;
+    const UNIT *result = U_STRTOK (input, delim, &state);
+    ASSERT (result == input);
+  }
+
+  {
+    UNIT input[] =
+      { 'A', 'B', 'C', ' ', 'A', 'B', 'C', 'D', 'A', 'B', ' ', '\t',
+       'A', 'B', 'C', 'D', 'A', 'B', 'C', 'D', 'A', 'B', 'D', 'E', 0
+      };
+    const UNIT delim[] = { ' ', '\t', 0 };
+    UNIT *state;
+    const UNIT *result;
+    result = U_STRTOK (input, delim, &state);
+    ASSERT (result == input);
+    result = U_STRTOK (NULL, delim, &state);
+    ASSERT (result == input + 4);
+    result = U_STRTOK (NULL, delim, &state);
+    ASSERT (result == input + 12);
+    result = U_STRTOK (NULL, delim, &state);
+    ASSERT (result == NULL);
+  }
+
+  /* Check for multibyte delimiters.  */
+  {
+    ucs4_t u_input[] =
+      { 'A', 'B', 'C', 0x3000, 'A', 'B', 'C', 'D', 'A', 'B', 0x3000, 0x3001,
+       'A', 'B', 'C', 'D', 'A', 'B', 'C', 'D', 'A', 'B', 'D', 'E', 0
+      };
+    ucs4_t u_delim[] = { 0x3000, 0x3001, 0 };
+    size_t input_len = 6 * SIZEOF (u_input);
+    UNIT *input = (UNIT *) malloc (input_len);
+    size_t delim_len = 6 * SIZEOF (u_delim);
+    UNIT *delim = (UNIT *) malloc (delim_len);
+    UNIT *state;
+    const UNIT *result;
+    UNIT *ptr, *first_ptr, *second_ptr;
+    size_t i;
+    for (i = 0, ptr = input; i < SIZEOF (u_input) && u_input[i] != 0; i++)
+      {
+       int ret = U_UCTOMB (ptr, u_input[i], input_len - (ptr - input));
+       if (i == 4)
+         first_ptr = ptr;
+       if (i == 12)
+         second_ptr = ptr;
+       ptr += ret;
+      }
+    *ptr = 0;
+    for (i = 0, ptr = delim; i < SIZEOF (u_delim) && u_delim[i] != 0; i++)
+      {
+       int ret = U_UCTOMB (ptr, u_delim[i], delim_len - (ptr - delim));
+       ptr += ret;
+      }
+    *ptr = 0;
+    result = U_STRTOK (input, delim, &state);
+    ASSERT (result == input);
+    result = U_STRTOK (NULL, delim, &state);
+    ASSERT (result == first_ptr);
+    result = U_STRTOK (NULL, delim, &state);
+    ASSERT (result == second_ptr);
+    result = U_STRTOK (NULL, delim, &state);
+    ASSERT (result == NULL);
+  }
+}
diff --git a/tests/unistr/test-u16-strtok.c b/tests/unistr/test-u16-strtok.c
new file mode 100644
index 0000000..cbd95ef
--- /dev/null
+++ b/tests/unistr/test-u16-strtok.c
@@ -0,0 +1,37 @@
+/* Test of u16_strtok() function.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <config.h>
+
+#include "unistr.h"
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#include "macros.h"
+
+#define UNIT uint16_t
+#define U_STRTOK u16_strtok
+#define U_UCTOMB u16_uctomb
+#include "test-u-strtok.h"
+
+int
+main (void)
+{
+  test_u_strtok ();
+
+  return 0;
+}
diff --git a/tests/unistr/test-u32-strtok.c b/tests/unistr/test-u32-strtok.c
new file mode 100644
index 0000000..04cded7
--- /dev/null
+++ b/tests/unistr/test-u32-strtok.c
@@ -0,0 +1,37 @@
+/* Test of u32_strtok() function.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <config.h>
+
+#include "unistr.h"
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#include "macros.h"
+
+#define UNIT uint32_t
+#define U_STRTOK u32_strtok
+#define U_UCTOMB u32_uctomb
+#include "test-u-strtok.h"
+
+int
+main (void)
+{
+  test_u_strtok ();
+
+  return 0;
+}
diff --git a/tests/unistr/test-u8-strtok.c b/tests/unistr/test-u8-strtok.c
new file mode 100644
index 0000000..6745ea8
--- /dev/null
+++ b/tests/unistr/test-u8-strtok.c
@@ -0,0 +1,37 @@
+/* Test of u8_strtok() function.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <config.h>
+
+#include "unistr.h"
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#include "macros.h"
+
+#define UNIT uint8_t
+#define U_STRTOK u8_strtok
+#define U_UCTOMB u8_uctomb
+#include "test-u-strtok.h"
+
+int
+main (void)
+{
+  test_u_strtok ();
+
+  return 0;
+}
-- 
2.4.2


reply via email to

[Prev in Thread] Current Thread [Next in Thread]