|
From: | Wolfgang Laun |
Subject: | Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug |
Date: | Sun, 7 Feb 2016 17:14:39 +0100 |
Generally, it sounds like the right thing to do is:
- in a UTF-8 locale, *always* deal with *characters* (Unicode codepoints), not bytes
- specifically, when encountering \xhh, compare it to the *Unicode codepoint* of the character at hand
Always dealing with characters makes sense to me, especially given that you can mix Unicode characters and \xhh escapes in a single bracket _expression_.Thus, given that \xff is the max. codepoint value that can currently be expressed, which doesn't allow matching the full range of Unicode characters, I suggest the following:
- At https://www.gnu.org/software/gawk/manual/html_node/Bracket-Expressions.html#Bracket-Expressions:
- document this limitation
- recommend the workaround of using actual characters rather than codepoint escapes as the range endpoints.
[Prev in Thread] | Current Thread | [Next in Thread] |