[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug
From: |
Michael Klement |
Subject: |
Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug |
Date: |
Fri, 19 Feb 2016 10:04:39 -0500 |
Great, thank you.
> On Feb 19, 2016, at 9:00 AM, Aharon Robbins <address@hidden> wrote:
>
> Hi.
>
>>> Generally, it sounds like the right thing to do is:
>>>
>>> - in a UTF-8 locale, *always* deal with *characters* (Unicode
>>> codepoints), not bytes
>>> - specifically, when encountering \xhh, compare it to the *Unicode
>>> codepoint* of the character at hand
>>>
>>>
>>> Always dealing with characters makes sense to me, especially given that
>>> you can *mix* Unicode characters and \x*hh* escapes in a single bracket
>>> expression.
>>>
>>> Thus, given that \xff is the max. codepoint value that can currently be
>>> expressed, which doesn't allow matching the full range of Unicode
>>> characters, I suggest the following:
>>>
>>> - At
>>>
>>> https://www.gnu.org/software/gawk/manual/html_node/Bracket-Expressions.html#Bracket-Expressions:
>>> - document this limitation
>>> - recommend the workaround of using actual characters rather than
>>> codepoint escapes as the range endpoints.
>
> This is what I've done. The changes will eventually propogate to the
> repo.
>
> I will talk to other GNU maintainers about how we want to deal with
> this issue; I don't want to invent something on my own and have it
> be different from other GNU utilities.
>
> Thanks,
>
> Arnold