[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#72086: scm_i_utf8_string_hash overruns buffer when len is zero
From: |
Rob Browning |
Subject: |
bug#72086: scm_i_utf8_string_hash overruns buffer when len is zero |
Date: |
Fri, 12 Jul 2024 19:43:18 -0500 |
The first patch attempts to fix that, and the second is an optimization
when then input is ASCII (since we already have the information we need
to detect that):
>From 619e3d3afec2c116007d9cb2ad32a500fb32a7dd Mon Sep 17 00:00:00 2001
From: Rob Browning <rlb@defaultvalue.org>
Date: Sun, 30 Jun 2024 22:41:40 -0500
Subject: [PATCH 1/2] scm_i_utf8_string_hash: don't overrun when len is zero
When the length is zero, the previous code would include the byte after
the end of the string in the hash. Fix that (the wide and narrow
hashers also guard against it via "case 0"), and while we're there,
switch to u8_mbtouc since the unsafe variant is now the same (see the
info pages), and don't bother mutating length for the trailing bytes,
since we don't need to.
libguile/hash.c (scm_i_utf8_string_hash): switch to u8_mbtouc, and avoid
overrun when len == 0.
---
libguile/hash.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/libguile/hash.c b/libguile/hash.c
index a038a11bf..d92f60df8 100644
--- a/libguile/hash.c
+++ b/libguile/hash.c
@@ -195,32 +195,34 @@ scm_i_utf8_string_hash (const char *str, size_t len)
/* Handle most of the key. */
while (length > 3)
{
- ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
+ ustr += u8_mbtouc (&u32, ustr, end - ustr);
a += u32;
- ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
+ ustr += u8_mbtouc (&u32, ustr, end - ustr);
b += u32;
- ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
+ ustr += u8_mbtouc (&u32, ustr, end - ustr);
c += u32;
mix (a, b, c);
length -= 3;
}
/* Handle the last 3 elements's. */
- ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
- a += u32;
- if (--length)
+ if (length)
{
- ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
- b += u32;
- if (--length)
+ ustr += u8_mbtouc (&u32, ustr, end - ustr);
+ a += u32;
+ if (length > 1)
{
- ustr += u8_mbtouc_unsafe (&u32, ustr, end - ustr);
- c += u32;
+ ustr += u8_mbtouc (&u32, ustr, end - ustr);
+ b += u32;
+ if (length > 2)
+ {
+ ustr += u8_mbtouc (&u32, ustr, end - ustr);
+ c += u32;
+ }
}
+ final (a, b, c);
}
- final (a, b, c);
-
if (sizeof (unsigned long) == 8)
ret = (((unsigned long) c) << 32) | b;
else
--
2.43.0
>From c6a888888101a893820f38561898e7c0390dd9d2 Mon Sep 17 00:00:00 2001
From: Rob Browning <rlb@defaultvalue.org>
Date: Mon, 1 Jul 2024 20:56:57 -0500
Subject: [PATCH 2/2] scm_i_utf8_string_hash: optimize ASCII
Since we already compute the char length, use that to detect all ASCII
strings and handle those the same way we handle latin-1.
libguile/hash.c (scm_i_utf8_string_hash): when byte_len == char_len,
(i.e. fixed-width ASCII) optimize hashing via existing narrow path.
---
libguile/hash.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/libguile/hash.c b/libguile/hash.c
index d92f60df8..fe950e358 100644
--- a/libguile/hash.c
+++ b/libguile/hash.c
@@ -169,25 +169,29 @@ scm_i_latin1_string_hash (const char *str, size_t len)
unsigned long
scm_i_utf8_string_hash (const char *str, size_t len)
{
- const uint8_t *end, *ustr = (const uint8_t *) str;
- unsigned long ret;
-
- /* The length of the string in characters. This name corresponds to
- Jenkins' original name. */
- size_t length;
-
- uint32_t a, b, c, u32;
-
if (len == (size_t) -1)
len = strlen (str);
- end = ustr + len;
-
+ const uint8_t *ustr = (const uint8_t *) str;
if (u8_check (ustr, len) != NULL)
/* Invalid UTF-8; punt. */
return scm_i_string_hash (scm_from_utf8_stringn (str, len));
- length = u8_mbsnlen (ustr, len);
+ /* The length of the string in characters. This name corresponds to
+ Jenkins' original name. */
+ size_t length = u8_mbsnlen (ustr, len);
+
+ if (len == length) // ascii, same as narrow_string_hash above
+ {
+ unsigned long ret;
+ JENKINS_LOOKUP3_HASHWORD2 (str, len, ret);
+ ret >>= 2; /* Ensure that it fits in a fixnum. */
+ return ret;
+ }
+
+ const uint8_t * const end = ustr + len;
+ uint32_t a, b, c, u32;
+ unsigned long ret;
/* Set up the internal state. */
a = b = c = 0xdeadbeef + ((uint32_t)(length<<2)) + 47;
--
2.43.0
Thanks
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#72086: scm_i_utf8_string_hash overruns buffer when len is zero,
Rob Browning <=