[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: From wchar_t to char32_t
|
From: |
Jim Meyering |
|
Subject: |
Re: From wchar_t to char32_t |
|
Date: |
Tue, 4 Jul 2023 19:31:51 -0700 |
On Sat, Jul 1, 2023 at 7:35 AM Bruno Haible <bruno@clisp.org> wrote:
>
> Here is a proposed patch to overcome the wchar_t limitation in the 'dfa'
> module.
>
> Jim: The background is explained in
> <https://www.gnu.org/software/gnulib/manual/html_node/Strings-and-Characters.html>
> The plan was exposed in
> <https://lists.gnu.org/archive/html/bug-gnulib/2018-12/msg00118.html>
> and <https://lists.gnu.org/archive/html/bug-gnulib/2023-06/msg00102.html>
>
> The 'grep' code needs a minimal change, accordingly. (Attached.)
>
> I have verified that this change does not cause test failures in 'grep'
> and in 'sed', on glibc systems, FreeBSD, and Solaris 11.4.
>
> Arnold: I have added '#if GAWK' conditionals, knowing that gawk's build system
> does not use gnulib-tool and you therefore pull from gnulib manually. This
> means the improvements will not land in gawk, since dfa in gawk will continue
> to use wchar_t.
>
> Objections?
>
>
> 2023-07-01 Bruno Haible <bruno@clisp.org>
>
> dfa: Overcome wchar_t limitations.
> * lib/localeinfo.h: Include <uchar.h>. Add special definitions for
> GAWK.
> (case_folded_counterparts): Change array element type to char32_t.
> * lib/localeinfo.c: Include <uchar.h>. Add special definitions for
> GAWK.
> (is_using_utf8, init_localeinfo): Use mbrtoc32 instead of mbrtowc.
> (lonesome_lower): Change element type to 'unsigned short'.
> (case_folded_counterparts): Change array element type to char32_t. Use
> c32toupper instead of towupper. Use c32tolower instead of towlower.
> * lib/dfa.c: Include <uchar.h>. Add special definitions for GAWK.
> (struct mb_char_classes): Change element type of 'chars' to char32_t.
> (mbs_to_wchar): Use mbrtoc32 instead of mbrtowc.
> (setbit_wc): Change type of first argument to char32_t. Use c32tob
> instead of wctob.
> (parse_bracket_exp): Update.
> (lex): Use c32isprint instead of iswprint. Use c32isspace instead of
> iswspace. Use c32rtomb instead of a %lc directive.
> (addtok_wc): Use c32rtomb instead of wcrtomb.
> (atom): Update.
> * modules/dfa (Depends-on): Remove wctype-h. Add uchar, mbrtoc32,
> c32rtomb, c32tob, c32tolower, c32toupper, c32isprint, c32isspace.
> (Link): Add $(LIBUNISTRING) $(LIBC32CONV).
> * modules/dfa-tests (Makefile.am): Link test-dfa-match-aux with
> $(LIBUNISTRING) $(LIBC32CONV).
> * NEWS: Mention the change.
Hi Bruno,
Thanks for porting those dfa improvements.
They all look fine to me.
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/01
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/01
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/02
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/02
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/02
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/03
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03