Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters

From:	Paolo Bonzini
Subject:	Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters
Date:	Sun, 11 Jul 2010 16:20:14 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100621 Fedora/3.0.5-1.fc13 Lightning/1.0b2pre Thunderbird/3.0.5

On 07/07/2010 03:44 PM, Pádraig Brady wrote:

Subject: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters

* lib/unistr/u8-strchr.c (u8_strchr): Use strchr() for
the single byte case as it was measured to be 50% faster
than the existing code on x86 linux.  Also add a comment
on why not to use memmem() for the moment for the multibyte case.

If p is surely a valid UTF-8 string, you can do better in general likethis. Say [q, q+q_len) points to an UTF-8 representation of uc:


  for (; p = strchr (p, *q) && memcmp (p+1, q+1, q_len-1); p += q_len)
    ;

  return p;

That's because once the first byte has matched, the length of the UTF-8character is known to be q_len. It's better than memmem if the startupcost of strchr is low enough (of course memcmp has to beinlined/unrolled/unswitched to get decent performance).

Does the argument of u8_strchr have this guarantee? If not, the abovecode can read arbitrary memory.


Paolo

---
  ChangeLog              |    4 ++++
  lib/unistr/u8-strchr.c |   19 +++++++------------
  2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index afcae28..8ca0bd7 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2010-07-07  Pádraig Brady<address@hidden>
+
+       * lib/unistr/u8-strchr.c (u8_strchr): Use strchr() as it's faster
+
  2010-07-04  Bruno Haible<address@hidden>

         fsusage: Clarify which code applies to which platforms.
diff --git a/lib/unistr/u8-strchr.c b/lib/unistr/u8-strchr.c
index 3be14c7..3dbd3ca 100644
--- a/lib/unistr/u8-strchr.c
+++ b/lib/unistr/u8-strchr.c
@@ -21,25 +21,20 @@
  /* Specification.  */
  #include "unistr.h"

+#include<string.h>
+
  uint8_t *
  u8_strchr (const uint8_t *s, ucs4_t uc)
  {
    uint8_t c[6];

    if (uc<  0x80)
-    {
-      uint8_t c0 = uc;
-
-      for (;; s++)
-        {
-          if (*s == c0)
-            break;
-          if (*s == 0)
-            goto notfound;
-        }
-      return (uint8_t *) s;
-    }
+    return strchr (s, uc);
    else
+    /* The following is equivalent to:
+         return memmem (s, strlen(s), c, csize);
+       but faster for long S with matching UC near the start,
+       and also memmem is sometimes buggy and inefficient.  */
      switch (u8_uctomb_aux (c, uc, 6))
        {
        case 2:

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/07
- Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Simon Josefsson, 2010/07/07
  - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/07
    - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/08
- Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Ralf Wildenhues, 2010/07/07
  - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/08
- Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Bruno Haible, 2010/07/11
- Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Paolo Bonzini <=
  - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/11
    - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Paolo Bonzini, 2010/07/12
    - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Bruno Haible, 2010/07/18
    - Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters, Pádraig Brady, 2010/07/20

Prev by Date: Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters
Next by Date: Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters
Previous by thread: Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters
Next by thread: Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters
Index(es):
- Date
- Thread