[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/5] unistr/u*-chr, unistr/u*-strchr: improve test coverage,
From: |
Bruno Haible |
Subject: |
Re: [PATCH 0/5] unistr/u*-chr, unistr/u*-strchr: improve test coverage, optimize |
Date: |
Wed, 21 Jul 2010 01:07:59 +0200 |
User-agent: |
KMail/1.9.9 |
Hello Paolo,
> this series introduces more optimizations (basically a Boyer-Moore
> algorithm without the table) in the u8_strchr and u8_chr functions.
> ...
> While doing this I also improved the coverage of the functions.
>
> Here are the timings:
>
> test-u8-chr test-u8-strchr
> before 1:00 1:07
> after 0:46 0:42
Great! Thank you for this.
> The algorithm introduces a heavy slow-down on UTF-16. This is probably
> due to the nature of the test, but I decided it is not worth optimizing
> that case a lot.
Yes, the majority of the users use the u8_* functions.
Comments about the individual patches:
About [PATCH 1/5] unistr/u*-strchr: add tests:
> +#define U_CHR u16_strchr
> + ASSERT (U_CHR (input, 'b') == input + 1);
The convention throughout libunistring is that U_XXX is the abbreviation
for u{8,16,32}_xxx. So tests/unistr/test-strchr.h looks like it is calling
u16_chr with two arguments. But u16_chr takes three arguments. Can you
please use U_STRCHR as macro name here, instead of U_CHR?
About [PATCH 2/5] unistr/u*-chr, unistr/u*-strchr: prepare for multibyte tests:
> * modules/unistr/u16-chr-test, modules/unistr/u16-strchr-test:
> Depend on u32-to-u16.
> * modules/unistr/u8-chr-test, modules/unistr/u8-strchr-test:
> Depend on u32-to-u8.
The file names actually end in '-tests'.
> version as UCS4 then convert.
The correct name of this encoding is UCS-4 with a hyphen/minus. (See RFC 2044.)
About [PATCH 4/5] unistr/u*-chr, unistr/u*-strchr: test multibyte sequences
more:
> + UNIT *exp;
> + UNIT *prev;
The number of local variables declared at the top of this function is getting
large. For readability, I would create a local block, to minimize their scope:
{
UNIT *exp;
UNIT *prev;
exp = input + 1026;
...
}
> + size_t n;
A local variable of the same name already exists in an outer scope.
It leads to a warning with "gcc -Wshadow". Can you rename it?
> + n = U_UCTOMB(c, uc, 6);
Needs a space before the opening parenthesis.
About [PATCH 5/5] unistr/u8-chr: use Boyer-Moore like algorithm:
In modules/unistr/u8-chr, a dependency to 'memchr' should be added.
> + return memchr (s, c0, n);
Please write this as
return (uint8_t *) memchr ((const char *) s, c0, n);
It's better to make the casts explicit in these cases, IMO,
for clarity, for maintenance.
> case 2:
> - if (n > 1)
> case 3:
> - if (n > 2)
> case 4:
> - if (n > 3)
Removing this test of n is wrong IMO. If, for example, uc is a
2-units character and n == 0, in the statement
const uint8_t *end = s + n - 1;
you create a pointer to a memory area where you are not allowed to have a
pointer to, according to ISO C.
Please also add comments that prove the validity of the code. Like this:
For 2:
+ uint8_t c0 = c[0];
+ uint8_t c1 = c[1];
/* Note: 0x80 <= c1 < 0xC0 < c0. */
+ while (s < end)
+ {
+ uint8_t s1 = s[1];
+ if (s1 == c1)
+ {
+ if (*s == c0)
+ return (uint8_t *) s;
+ else
/* We can skip the search at s + 1, because s[1] == c1 <
c0. */
+ s += 2;
+ }
+ else
+ {
+ if (s1 == c0)
+ s++;
+ else
/* We can skip the search at s + 1, because s[1] != c0. */
+ s += 2;
+ }
+ }
For 3:
+ uint8_t c0 = c[0];
+ uint8_t c1 = c[1];
+ uint8_t c2 = c[2];
/* Note: c1 and c2 are < 0xC0 < c0. */
+ while (s < end)
+ {
+ uint8_t s2 = s[2];
+ if (s2 == c2)
+ {
+ if (s[1] == c1 && *s == c0)
+ return (uint8_t *) s;
+ else if (s2 == c1)
+ s++;
+ else
/* We can skip the search at s + 1, because s[2] != c1.
We can skip the search at s + 2, because s[2] == c2 <
c0. */
+ s += 3;
+ }
+ else
+ {
+ if (s2 == c1)
+ s++;
+ else if (s2 == c0)
/* We can skip the search at s + 1, because s[2] != c1. */
+ s += 2;
+ else
/* We can skip the search at s + 1, because s[2] != c1.
We can skip the search at s + 2, because s[2] != c0. */
+ s += 3;
+ }
+ }
Likewise for 4.
Otherwise nicely done! Thanks!
Bruno
- [PATCH 0/5] unistr/u*-chr, unistr/u*-strchr: improve test coverage, optimize, bonzini, 2010/07/20
- [PATCH 1/5] unistr/u*-strchr: add tests, bonzini, 2010/07/20
- [PATCH 2/5] unistr/u*-chr, unistr/u*-strchr: prepare for multibyte tests, bonzini, 2010/07/20
- [PATCH 4/5] unistr/u*-chr, unistr/u*-strchr: test multibyte sequences more, bonzini, 2010/07/20
- [PATCH 3/5] unistr/u*-chr, unistr/u*-strchr: test multibyte sequences, bonzini, 2010/07/20
- [PATCH 5/5] unistr/u8-chr: use Boyer-Moore like algorithm., bonzini, 2010/07/20
- Re: [PATCH 0/5] unistr/u*-chr, unistr/u*-strchr: improve test coverage, optimize,
Bruno Haible <=