[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
mbchar: Optimize is_basic
|
From: |
Bruno Haible |
|
Subject: |
mbchar: Optimize is_basic |
|
Date: |
Thu, 13 Jul 2023 23:30:49 +0200 |
So far, the is_basic_table in mbchar.c marked all bytes in the range
0x20..0x7E as "basic", but not all of the control characters 0x00..0x1F.
But nowadays, all locale encodings map this range 0x00..0x1F to
U+0000..U+001F. The last encodings which did not have this property
were VISCII and TCVN5712-1, but I could eliminate them from localcharset.h
through the previous patch.
This allows some small optimization:
2023-07-13 Bruno Haible <bruno@clisp.org>
mbchar: Optimize is_basic.
* lib/mbchar.h (is_basic_table): Remove declaration.
(is_basic) [IS_BASIC_ASCII]: Define through a simple range test.
* lib/mbchar.c (is_basic_table): Remove array.
diff --git a/lib/mbchar.c b/lib/mbchar.c
index 63ff9c7a72..af3c7934dc 100644
--- a/lib/mbchar.c
+++ b/lib/mbchar.c
@@ -21,19 +21,3 @@
#include <limits.h>
#include "mbchar.h"
-
-#if IS_BASIC_ASCII
-
-/* Bit table of characters in the POSIX "portable character set", which
- POSIX guarantees to be single-byte and in practice are safe to treat
- like the ISO C "basic character set". */
-const unsigned int is_basic_table [UCHAR_MAX / 32 + 1] =
-{
- 0x00003f81, /* '\0' '\007' '\010' '\t' '\n' '\v' '\f' '\r' */
- 0xffffffff, /* ' '......'?' */
- 0xffffffff, /* '@' 'A'...'Z' '[' '\\' ']' '^' '_' */
- 0x7fffffff /* '`' 'a'...'z' '{' '|' '}' '~' */
- /* The remaining bits are 0. */
-};
-
-#endif /* IS_BASIC_ASCII */
diff --git a/lib/mbchar.h b/lib/mbchar.h
index 36bae18276..82c373f47e 100644
--- a/lib/mbchar.h
+++ b/lib/mbchar.h
@@ -309,14 +309,17 @@ mb_copy (mbchar_t *new_mbc, const mbchar_t *old_mbc)
/* The character set is ISO-646, not EBCDIC. */
# define IS_BASIC_ASCII 1
-extern const unsigned int is_basic_table[];
-
-MBCHAR_INLINE bool
-is_basic (char c)
-{
- return (is_basic_table [(unsigned char) c >> 5] >> ((unsigned char) c & 31))
- & 1;
-}
+/* All locale encodings (see localcharset.h) map the characters 0x00..0x7F
+ to U+0000..U+007F, like ASCII, except for
+ CP864 different mapping of '%'
+ SHIFT_JIS different mappings of 0x5C, 0x7E
+ JOHAB different mapping of 0x5C
+ However, these characters in the range 0x20..0x7E are in the ISO C
+ "basic character set" and in the POSIX "portable character set", which
+ ISO C and POSIX guarantee to be single-byte. Thus, locales with these
+ encodings are not POSIX compliant. And they are most likely not in use
+ any more (as of 2023). */
+# define is_basic(c) ((unsigned char) (c) < 0x80)
#else
| [Prev in Thread] |
Current Thread |
[Next in Thread] |
- mbchar: Optimize is_basic,
Bruno Haible <=