bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug


From: Michael Klement
Subject: Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug
Date: Fri, 19 Feb 2016 10:04:39 -0500

Great, thank you.

> On Feb 19, 2016, at 9:00 AM, Aharon Robbins <address@hidden> wrote:
> 
> Hi.
> 
>>> Generally, it sounds like the right thing to do is:
>>> 
>>>   - in a UTF-8 locale, *always* deal with *characters* (Unicode
>>>   codepoints), not bytes
>>>   - specifically, when encountering \xhh, compare it to the *Unicode
>>>   codepoint* of the character at hand
>>> 
>>> 
>>> Always dealing with characters makes sense to me, especially given that
>>> you can *mix* Unicode characters and \x*hh* escapes in a single bracket
>>> expression.
>>> 
>>> Thus, given that \xff is the max. codepoint value that can currently be
>>> expressed, which doesn't allow matching the full range of Unicode
>>> characters, I suggest the following:
>>> 
>>>   - At
>>>   
>>> https://www.gnu.org/software/gawk/manual/html_node/Bracket-Expressions.html#Bracket-Expressions:
>>>      - document this limitation
>>>      - recommend the workaround of using actual characters rather than
>>>      codepoint escapes as the range endpoints.
> 
> This is what I've done. The changes will eventually propogate to the
> repo.
> 
> I will talk to other GNU maintainers about how we want to deal with
> this issue; I don't want to invent something on my own and have it
> be different from other GNU utilities.
> 
> Thanks,
> 
> Arnold




reply via email to

[Prev in Thread] Current Thread [Next in Thread]