[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing
From: |
Jim Meyering |
Subject: |
Re: [PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing |
Date: |
Tue, 16 Mar 2010 19:15:57 +0100 |
Paolo Bonzini wrote:
> Let dfacomp do the folding to lowercase of multibyte regexes, and remove
> it from grep.c. Input strings to kwset.c are still folded outside
> kwset.c.
>
> * NEWS: Document bugfixes.
> * .x-sc_cast_of_argument_to_free: Remove.
> * src/dfa.c (wctok, addtok_wc): New.
...
This looks fine.
Two suggestions:
The literal array dimension in buf[16] should probably be
written as buf[MB_LEN_MAX], for which we'd need <limits.h>.
Also, I've added to the description of the new mbtolower function.
diff --git a/src/dfa.c b/src/dfa.c
index 5a549b3..2859570 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -27,6 +27,7 @@
#include <sys/types.h>
#include <stddef.h>
#include <stdlib.h>
+#include <limits.h>
#include <string.h>
#include <locale.h>
@@ -1165,7 +1166,7 @@ addtok (token t)
static void
addtok_wc (wint_t wc)
{
- unsigned char buf[16];
+ unsigned char buf[MB_LEN_MAX];
mbstate_t s;
int i;
memset (&s, 0, sizeof(s));
diff --git a/src/search.c b/src/search.c
index 6533c16..1554c08 100644
--- a/src/search.c
+++ b/src/search.c
@@ -73,10 +73,14 @@ kwsinit (void)
}
#ifdef MBS_SUPPORT
-/* Convert the string from BEG to N to lowercase. Overwrite N
- with the length of the new string, and return a pointer to
- the lowercase string. Successive calls to mbtolower will
- rewrite the output buffer. */
+/* Convert the *N-byte string, BEG, to lowercase, and write the
+ NUL-terminated result into malloc'd storage. Upon success, set *N
+ to the length (in bytes) of the resulting string (not including the
+ trailing NUL byte), and return a pointer to the lowercase string.
+ Note that while this function returns a pointer to malloc'd storage,
+ the caller must not free it, since this function retains a pointer
+ to the buffer and reuses it on any subsequent call.
+ Upon memory allocation failure, this function exits. */
static char *
mbtolower (const char *beg, size_t *n)
{
[PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing, Paolo Bonzini, 2010/03/14
- Re: [PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing,
Jim Meyering <=
[PATCH 4/9] dfa: speed up handling of brackets, Paolo Bonzini, 2010/03/14
[PATCH 5/9] dfa: optimize simple character sets under UTF-8 charsets, Paolo Bonzini, 2010/03/14
[PATCH 7/9] dfa: run simple UTF-8 regexps as a single-byte character set, Paolo Bonzini, 2010/03/14