[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-libunistring] A bug in u-strtok.h and the fix in libunistring-0
From: |
Daiki Ueno |
Subject: |
Re: [bug-libunistring] A bug in u-strtok.h and the fix in libunistring-0.9.5 |
Date: |
Fri, 03 Jul 2015 11:58:45 +0900 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) |
Hello,
Seiya Kawashima <address@hidden> writes:
> During my experiments with libunistring-0.9.5, I've found an error in
> u-strtok.h as below. The lines
> starting with "////" are my changes.
>
> /* Move past the token. */
> {
> UNIT *token_end = U_STRPBRK (str, delim);
> if (token_end)
> {
> /* NUL-terminate the token. */
> *token_end = 0;
> *ptr = token_end + 1;
> //// These lines should be something like below.
> //// *ptr = token_end + (sizeof(uint8_t) * u8_strmblen(token_end));
> //// *token_end = 0;
> }
> else
> *ptr = NULL;
> }
>
> So the original code tries to start the next search without checking how many
> bytes are actually
> taken by a matched delimiter but assuming 1 by "token_end + 1". When the
> delimiter takes more
> than one UNIT such as a delimiter in Japanese, this assumption fails and
> starts the next search from
> an invalid location which is in the middle of a Unicode character.
Indeed, thanks for the report.
> To solve the issue, one can define U_STRMBLEN with u8_strmblen,u16_strmblen
> and u32_strmblen
> accordingly and call it like *ptr = token_end + (sizeof(UNIT) *
> U_STRMBLEN(token_end)) instead of
> *ptr = token_end + 1.
Yes, I've pushed a slightly modified fix (attached) in your name, along
with test cases.
Regards,
--
Daiki Ueno
>From 2c5543396b7c208b34fa714430801945aefb1fe8 Mon Sep 17 00:00:00 2001
From: Seiya Kawashima <address@hidden>
Date: Fri, 3 Jul 2015 11:42:43 +0900
Subject: [PATCH] unistr/uN-strtok: handle multibyte delimiters
Previously, uN_strtok moved PTR to the next unit to the token end.
When DELIM contained a multibyte character, the new position could
be a middle of a multibyte character.
* lib/unistr/u-strtok.h (FUNC): Place PTR at the next character
after the token.
* lib/unistr/u8-strtok.c (U_STRMBLEN): New macro.
* lib/unistr/u16-strtok.c (U_STRMBLEN): New macro.
* lib/unistr/u32-strtok.c (U_STRMBLEN): New macro.
* modules/unistr/u8-strtok (Depends-on): Depend on
unistr/u8-strmblen.
* modules/unistr/u16-strtok (Depends-on): Depend on
unistr/u16-strmblen.
* modules/unistr/u32-strtok (Depends-on): Depend on
unistr/u32-strmblen.
* tests/unistr/test-u-strtok.h: New file.
* tests/unistr/test-u8-strtok.c: New file.
* tests/unistr/test-u16-strtok.c: New file.
* tests/unistr/test-u32-strtok.c: New file.
* modules/unistr/u8-strtok-tests: New file.
* modules/unistr/u32-strtok-tests: New file.
* modules/unistr/u16-strtok-tests: New file.
Copyright-paperwork-exempt: yes
Co-authored-by: Daiki Ueno <address@hidden>
---
ChangeLog | 26 +++++++++++++
lib/unistr/u-strtok.h | 2 +-
lib/unistr/u16-strtok.c | 1 +
lib/unistr/u32-strtok.c | 1 +
lib/unistr/u8-strtok.c | 1 +
modules/unistr/u16-strtok | 1 +
modules/unistr/u16-strtok-tests | 14 +++++++
modules/unistr/u32-strtok | 1 +
modules/unistr/u32-strtok-tests | 15 +++++++
modules/unistr/u8-strtok | 1 +
modules/unistr/u8-strtok-tests | 14 +++++++
tests/unistr/test-u-strtok.h | 86 +++++++++++++++++++++++++++++++++++++++++
tests/unistr/test-u16-strtok.c | 37 ++++++++++++++++++
tests/unistr/test-u32-strtok.c | 37 ++++++++++++++++++
tests/unistr/test-u8-strtok.c | 37 ++++++++++++++++++
15 files changed, 273 insertions(+), 1 deletion(-)
create mode 100644 modules/unistr/u16-strtok-tests
create mode 100644 modules/unistr/u32-strtok-tests
create mode 100644 modules/unistr/u8-strtok-tests
create mode 100644 tests/unistr/test-u-strtok.h
create mode 100644 tests/unistr/test-u16-strtok.c
create mode 100644 tests/unistr/test-u32-strtok.c
create mode 100644 tests/unistr/test-u8-strtok.c
diff --git a/ChangeLog b/ChangeLog
index 8893c75..373583f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,29 @@
+2015-07-03 Seiya Kawashima <address@hidden> (tiny-change)
+ and Daiki Ueno <address@hidden>
+
+ unistr/uN-strtok: handle multibyte delimiters
+ Previously, uN_strtok moved PTR to the next unit to the token end.
+ When DELIM contained a multibyte character, the new position could
+ be a middle of a multibyte character.
+ * lib/unistr/u-strtok.h (FUNC): Place PTR at the next character
+ after the token.
+ * lib/unistr/u8-strtok.c (U_STRMBLEN): New macro.
+ * lib/unistr/u16-strtok.c (U_STRMBLEN): New macro.
+ * lib/unistr/u32-strtok.c (U_STRMBLEN): New macro.
+ * modules/unistr/u8-strtok (Depends-on): Depend on
+ unistr/u8-strmblen.
+ * modules/unistr/u16-strtok (Depends-on): Depend on
+ unistr/u16-strmblen.
+ * modules/unistr/u32-strtok (Depends-on): Depend on
+ unistr/u32-strmblen.
+ * tests/unistr/test-u-strtok.h: New file.
+ * tests/unistr/test-u8-strtok.c: New file.
+ * tests/unistr/test-u16-strtok.c: New file.
+ * tests/unistr/test-u32-strtok.c: New file.
+ * modules/unistr/u8-strtok-tests: New file.
+ * modules/unistr/u32-strtok-tests: New file.
+ * modules/unistr/u16-strtok-tests: New file.
+
2015-07-02 Friedrich Haubensak <address@hidden>
update-copyright: fix test failure with perl >= 5.22 (trivial)
diff --git a/lib/unistr/u-strtok.h b/lib/unistr/u-strtok.h
index 4cd6830..edafa1b 100644
--- a/lib/unistr/u-strtok.h
+++ b/lib/unistr/u-strtok.h
@@ -40,9 +40,9 @@ FUNC (UNIT *str, const UNIT *delim, UNIT **ptr)
UNIT *token_end = U_STRPBRK (str, delim);
if (token_end)
{
+ *ptr = token_end + U_STRMBLEN (token_end);
/* NUL-terminate the token. */
*token_end = 0;
- *ptr = token_end + 1;
}
else
*ptr = NULL;
diff --git a/lib/unistr/u16-strtok.c b/lib/unistr/u16-strtok.c
index 62e2f49..df36cf7 100644
--- a/lib/unistr/u16-strtok.c
+++ b/lib/unistr/u16-strtok.c
@@ -24,4 +24,5 @@
#define UNIT uint16_t
#define U_STRSPN u16_strspn
#define U_STRPBRK u16_strpbrk
+#define U_STRMBLEN u16_strmblen
#include "u-strtok.h"
diff --git a/lib/unistr/u32-strtok.c b/lib/unistr/u32-strtok.c
index 0bb16ff..f8ef999 100644
--- a/lib/unistr/u32-strtok.c
+++ b/lib/unistr/u32-strtok.c
@@ -24,4 +24,5 @@
#define UNIT uint32_t
#define U_STRSPN u32_strspn
#define U_STRPBRK u32_strpbrk
+#define U_STRMBLEN u32_strmblen
#include "u-strtok.h"
diff --git a/lib/unistr/u8-strtok.c b/lib/unistr/u8-strtok.c
index efb6a9a..1e4e6ef 100644
--- a/lib/unistr/u8-strtok.c
+++ b/lib/unistr/u8-strtok.c
@@ -24,4 +24,5 @@
#define UNIT uint8_t
#define U_STRSPN u8_strspn
#define U_STRPBRK u8_strpbrk
+#define U_STRMBLEN u8_strmblen
#include "u-strtok.h"
diff --git a/modules/unistr/u16-strtok b/modules/unistr/u16-strtok
index 50773f5..98cfcf6 100644
--- a/modules/unistr/u16-strtok
+++ b/modules/unistr/u16-strtok
@@ -9,6 +9,7 @@ Depends-on:
unistr/base
unistr/u16-strspn
unistr/u16-strpbrk
+unistr/u16-strmblen
configure.ac:
gl_LIBUNISTRING_MODULE([0.9], [unistr/u16-strtok])
diff --git a/modules/unistr/u16-strtok-tests b/modules/unistr/u16-strtok-tests
new file mode 100644
index 0000000..c6b71d8
--- /dev/null
+++ b/modules/unistr/u16-strtok-tests
@@ -0,0 +1,14 @@
+Files:
+tests/unistr/test-u16-strtok.c
+tests/unistr/test-u-strtok.h
+tests/macros.h
+
+Depends-on:
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-u16-strtok
+check_PROGRAMS += test-u16-strtok
+test_u16_strtok_SOURCES = unistr/test-u16-strtok.c
+test_u16_strtok_LDADD = $(LDADD) $(LIBUNISTRING)
diff --git a/modules/unistr/u32-strtok b/modules/unistr/u32-strtok
index 2e19c81..1206248 100644
--- a/modules/unistr/u32-strtok
+++ b/modules/unistr/u32-strtok
@@ -9,6 +9,7 @@ Depends-on:
unistr/base
unistr/u32-strspn
unistr/u32-strpbrk
+unistr/u32-strmblen
configure.ac:
gl_LIBUNISTRING_MODULE([0.9], [unistr/u32-strtok])
diff --git a/modules/unistr/u32-strtok-tests b/modules/unistr/u32-strtok-tests
new file mode 100644
index 0000000..319e0a9
--- /dev/null
+++ b/modules/unistr/u32-strtok-tests
@@ -0,0 +1,15 @@
+Files:
+tests/unistr/test-u32-strtok.c
+tests/unistr/test-u-strtok.h
+tests/macros.h
+
+Depends-on:
+unistr/u32-uctomb
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-u32-strtok
+check_PROGRAMS += test-u32-strtok
+test_u32_strtok_SOURCES = unistr/test-u32-strtok.c
+test_u32_strtok_LDADD = $(LDADD) $(LIBUNISTRING)
diff --git a/modules/unistr/u8-strtok b/modules/unistr/u8-strtok
index 28d7ed5..657b6fd 100644
--- a/modules/unistr/u8-strtok
+++ b/modules/unistr/u8-strtok
@@ -9,6 +9,7 @@ Depends-on:
unistr/base
unistr/u8-strspn
unistr/u8-strpbrk
+unistr/u8-strmblen
configure.ac:
gl_LIBUNISTRING_MODULE([0.9], [unistr/u8-strtok])
diff --git a/modules/unistr/u8-strtok-tests b/modules/unistr/u8-strtok-tests
new file mode 100644
index 0000000..f6e2588
--- /dev/null
+++ b/modules/unistr/u8-strtok-tests
@@ -0,0 +1,14 @@
+Files:
+tests/unistr/test-u8-strtok.c
+tests/unistr/test-u-strtok.h
+tests/macros.h
+
+Depends-on:
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-u8-strtok
+check_PROGRAMS += test-u8-strtok
+test_u8_strtok_SOURCES = unistr/test-u8-strtok.c
+test_u8_strtok_LDADD = $(LDADD) $(LIBUNISTRING)
diff --git a/tests/unistr/test-u-strtok.h b/tests/unistr/test-u-strtok.h
new file mode 100644
index 0000000..735f033
--- /dev/null
+++ b/tests/unistr/test-u-strtok.h
@@ -0,0 +1,86 @@
+/* Test of uN_strtok() functions.
+ Copyright (C) 2015 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+static void
+test_u_strtok (void)
+{
+ {
+ UNIT input[] = { 'f', 'o', 'o', 0 };
+ const UNIT delim[] = { 0 };
+ UNIT *state;
+ const UNIT *result = U_STRTOK (input, delim, &state);
+ ASSERT (result == input);
+ }
+
+ {
+ UNIT input[] =
+ { 'A', 'B', 'C', ' ', 'A', 'B', 'C', 'D', 'A', 'B', ' ', '\t',
+ 'A', 'B', 'C', 'D', 'A', 'B', 'C', 'D', 'A', 'B', 'D', 'E', 0
+ };
+ const UNIT delim[] = { ' ', '\t', 0 };
+ UNIT *state;
+ const UNIT *result;
+ result = U_STRTOK (input, delim, &state);
+ ASSERT (result == input);
+ result = U_STRTOK (NULL, delim, &state);
+ ASSERT (result == input + 4);
+ result = U_STRTOK (NULL, delim, &state);
+ ASSERT (result == input + 12);
+ result = U_STRTOK (NULL, delim, &state);
+ ASSERT (result == NULL);
+ }
+
+ /* Check for multibyte delimiters. */
+ {
+ ucs4_t u_input[] =
+ { 'A', 'B', 'C', 0x3000, 'A', 'B', 'C', 'D', 'A', 'B', 0x3000, 0x3001,
+ 'A', 'B', 'C', 'D', 'A', 'B', 'C', 'D', 'A', 'B', 'D', 'E', 0
+ };
+ ucs4_t u_delim[] = { 0x3000, 0x3001, 0 };
+ size_t input_len = 6 * SIZEOF (u_input);
+ UNIT *input = (UNIT *) malloc (input_len);
+ size_t delim_len = 6 * SIZEOF (u_delim);
+ UNIT *delim = (UNIT *) malloc (delim_len);
+ UNIT *state;
+ const UNIT *result;
+ UNIT *ptr, *first_ptr, *second_ptr;
+ size_t i;
+ for (i = 0, ptr = input; i < SIZEOF (u_input) && u_input[i] != 0; i++)
+ {
+ int ret = U_UCTOMB (ptr, u_input[i], input_len - (ptr - input));
+ if (i == 4)
+ first_ptr = ptr;
+ if (i == 12)
+ second_ptr = ptr;
+ ptr += ret;
+ }
+ *ptr = 0;
+ for (i = 0, ptr = delim; i < SIZEOF (u_delim) && u_delim[i] != 0; i++)
+ {
+ int ret = U_UCTOMB (ptr, u_delim[i], delim_len - (ptr - delim));
+ ptr += ret;
+ }
+ *ptr = 0;
+ result = U_STRTOK (input, delim, &state);
+ ASSERT (result == input);
+ result = U_STRTOK (NULL, delim, &state);
+ ASSERT (result == first_ptr);
+ result = U_STRTOK (NULL, delim, &state);
+ ASSERT (result == second_ptr);
+ result = U_STRTOK (NULL, delim, &state);
+ ASSERT (result == NULL);
+ }
+}
diff --git a/tests/unistr/test-u16-strtok.c b/tests/unistr/test-u16-strtok.c
new file mode 100644
index 0000000..cbd95ef
--- /dev/null
+++ b/tests/unistr/test-u16-strtok.c
@@ -0,0 +1,37 @@
+/* Test of u16_strtok() function.
+ Copyright (C) 2015 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include "unistr.h"
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#include "macros.h"
+
+#define UNIT uint16_t
+#define U_STRTOK u16_strtok
+#define U_UCTOMB u16_uctomb
+#include "test-u-strtok.h"
+
+int
+main (void)
+{
+ test_u_strtok ();
+
+ return 0;
+}
diff --git a/tests/unistr/test-u32-strtok.c b/tests/unistr/test-u32-strtok.c
new file mode 100644
index 0000000..04cded7
--- /dev/null
+++ b/tests/unistr/test-u32-strtok.c
@@ -0,0 +1,37 @@
+/* Test of u32_strtok() function.
+ Copyright (C) 2015 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include "unistr.h"
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#include "macros.h"
+
+#define UNIT uint32_t
+#define U_STRTOK u32_strtok
+#define U_UCTOMB u32_uctomb
+#include "test-u-strtok.h"
+
+int
+main (void)
+{
+ test_u_strtok ();
+
+ return 0;
+}
diff --git a/tests/unistr/test-u8-strtok.c b/tests/unistr/test-u8-strtok.c
new file mode 100644
index 0000000..6745ea8
--- /dev/null
+++ b/tests/unistr/test-u8-strtok.c
@@ -0,0 +1,37 @@
+/* Test of u8_strtok() function.
+ Copyright (C) 2015 Free Software Foundation, Inc.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include "unistr.h"
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#include "macros.h"
+
+#define UNIT uint8_t
+#define U_STRTOK u8_strtok
+#define U_UCTOMB u8_uctomb
+#include "test-u-strtok.h"
+
+int
+main (void)
+{
+ test_u_strtok ();
+
+ return 0;
+}
--
2.4.2