[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 6/6] grep: scan back thru UTF-8 a bit faster
From: |
Paul Eggert |
Subject: |
[PATCH 6/6] grep: scan back thru UTF-8 a bit faster |
Date: |
Tue, 24 Aug 2021 00:45:41 -0700 |
* src/searchutils.c (mb_goback): When scanning backward through
UTF-8, check the length implied by the putative byte 1 before
bothering to invoke mb_clen. This length check also lets us use
mbrlen directly rather than calling mb_clen, which would
eventually defer to mbrlen anyway.
---
src/searchutils.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/src/searchutils.c b/src/searchutils.c
index f16dd84..0080dd7 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -107,13 +107,20 @@ mb_goback (char const **mb_start, size_t *mbclen, char
const *cur,
for (int i = 1; i <= 3; i++)
if ((cur[-i] & 0xc0) != 0x80)
{
- mbstate_t mbs = { 0 };
- size_t clen = mb_clen (cur - i, end - (cur - i), &mbs);
- if (i < clen && clen <= MB_LEN_MAX)
+ /* True if the length implied by the putative byte 1 at
+ CUR[-I] extends at least through *CUR. */
+ bool long_enough = (~cur[-i] & 0xff) >> (7 - i) == 0;
+
+ if (long_enough)
{
- /* This multibyte character contains *CUR. */
- p0 = cur - i;
- p = p0 + clen;
+ mbstate_t mbs = { 0 };
+ size_t clen = mbrlen (cur - i, end - (cur - i), &mbs);
+ if (clen <= MB_LEN_MAX)
+ {
+ /* This multibyte character contains *CUR. */
+ p0 = cur - i;
+ p = p0 + clen;
+ }
}
break;
}
--
2.31.1