[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: From wchar_t to char32_t
|
From: |
Bruno Haible |
|
Subject: |
Re: From wchar_t to char32_t |
|
Date: |
Sat, 01 Jul 2023 16:35:21 +0200 |
Here is a proposed patch to overcome the wchar_t limitation in the 'dfa'
module.
Jim: The background is explained in
<https://www.gnu.org/software/gnulib/manual/html_node/Strings-and-Characters.html>
The plan was exposed in
<https://lists.gnu.org/archive/html/bug-gnulib/2018-12/msg00118.html>
and <https://lists.gnu.org/archive/html/bug-gnulib/2023-06/msg00102.html>
The 'grep' code needs a minimal change, accordingly. (Attached.)
I have verified that this change does not cause test failures in 'grep'
and in 'sed', on glibc systems, FreeBSD, and Solaris 11.4.
Arnold: I have added '#if GAWK' conditionals, knowing that gawk's build system
does not use gnulib-tool and you therefore pull from gnulib manually. This
means the improvements will not land in gawk, since dfa in gawk will continue
to use wchar_t.
Objections?
2023-07-01 Bruno Haible <bruno@clisp.org>
dfa: Overcome wchar_t limitations.
* lib/localeinfo.h: Include <uchar.h>. Add special definitions for GAWK.
(case_folded_counterparts): Change array element type to char32_t.
* lib/localeinfo.c: Include <uchar.h>. Add special definitions for GAWK.
(is_using_utf8, init_localeinfo): Use mbrtoc32 instead of mbrtowc.
(lonesome_lower): Change element type to 'unsigned short'.
(case_folded_counterparts): Change array element type to char32_t. Use
c32toupper instead of towupper. Use c32tolower instead of towlower.
* lib/dfa.c: Include <uchar.h>. Add special definitions for GAWK.
(struct mb_char_classes): Change element type of 'chars' to char32_t.
(mbs_to_wchar): Use mbrtoc32 instead of mbrtowc.
(setbit_wc): Change type of first argument to char32_t. Use c32tob
instead of wctob.
(parse_bracket_exp): Update.
(lex): Use c32isprint instead of iswprint. Use c32isspace instead of
iswspace. Use c32rtomb instead of a %lc directive.
(addtok_wc): Use c32rtomb instead of wcrtomb.
(atom): Update.
* modules/dfa (Depends-on): Remove wctype-h. Add uchar, mbrtoc32,
c32rtomb, c32tob, c32tolower, c32toupper, c32isprint, c32isspace.
(Link): Add $(LIBUNISTRING) $(LIBC32CONV).
* modules/dfa-tests (Makefile.am): Link test-dfa-match-aux with
$(LIBUNISTRING) $(LIBC32CONV).
* NEWS: Mention the change.
0001-dfa-Overcome-wchar_t-limitations.patch
Description: Text Data
0001-grep-Update-after-gnulib-changed.patch
Description: Text Data