bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: From wchar_t to char32_t


From: Bruno Haible
Subject: Re: From wchar_t to char32_t
Date: Sat, 01 Jul 2023 16:35:21 +0200

Here is a proposed patch to overcome the wchar_t limitation in the 'dfa'
module.

Jim: The background is explained in
<https://www.gnu.org/software/gnulib/manual/html_node/Strings-and-Characters.html>
The plan was exposed in
    <https://lists.gnu.org/archive/html/bug-gnulib/2018-12/msg00118.html>
and <https://lists.gnu.org/archive/html/bug-gnulib/2023-06/msg00102.html>

The 'grep' code needs a minimal change, accordingly. (Attached.)

I have verified that this change does not cause test failures in 'grep'
and in 'sed', on glibc systems, FreeBSD, and Solaris 11.4.

Arnold: I have added '#if GAWK' conditionals, knowing that gawk's build system
does not use gnulib-tool and you therefore pull from gnulib manually. This
means the improvements will not land in gawk, since dfa in gawk will continue
to use wchar_t.

Objections?


2023-07-01  Bruno Haible  <bruno@clisp.org>

        dfa: Overcome wchar_t limitations.
        * lib/localeinfo.h: Include <uchar.h>. Add special definitions for GAWK.
        (case_folded_counterparts): Change array element type to char32_t.
        * lib/localeinfo.c: Include <uchar.h>. Add special definitions for GAWK.
        (is_using_utf8, init_localeinfo): Use mbrtoc32 instead of mbrtowc.
        (lonesome_lower): Change element type to 'unsigned short'.
        (case_folded_counterparts): Change array element type to char32_t. Use
        c32toupper instead of towupper. Use c32tolower instead of towlower.
        * lib/dfa.c: Include <uchar.h>. Add special definitions for GAWK.
        (struct mb_char_classes): Change element type of 'chars' to char32_t.
        (mbs_to_wchar): Use mbrtoc32 instead of mbrtowc.
        (setbit_wc): Change type of first argument to char32_t. Use c32tob
        instead of wctob.
        (parse_bracket_exp): Update.
        (lex): Use c32isprint instead of iswprint. Use c32isspace instead of
        iswspace. Use c32rtomb instead of a %lc directive.
        (addtok_wc): Use c32rtomb instead of wcrtomb.
        (atom): Update.
        * modules/dfa (Depends-on): Remove wctype-h. Add uchar, mbrtoc32,
        c32rtomb, c32tob, c32tolower, c32toupper, c32isprint, c32isspace.
        (Link): Add $(LIBUNISTRING) $(LIBC32CONV).
        * modules/dfa-tests (Makefile.am): Link test-dfa-match-aux with
        $(LIBUNISTRING) $(LIBC32CONV).
        * NEWS: Mention the change.

Attachment: 0001-dfa-Overcome-wchar_t-limitations.patch
Description: Text Data

Attachment: 0001-grep-Update-after-gnulib-changed.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]