bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug


From: Wolfgang Laun
Subject: Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug
Date: Sun, 7 Feb 2016 17:14:39 +0100

On 7 February 2016 at 16:54, Michael Klement <address@hidden> wrote:

Generally, it sounds like the right thing to do is:

  • in a UTF-8 locale, *always* deal with *characters* (Unicode codepoints), not bytes
  • specifically, when encountering \xhh, compare it to the *Unicode codepoint* of the character at hand

Always dealing with characters makes sense to me, especially given that you can mix Unicode characters and \xhh escapes in a single bracket _expression_.

Thus, given that \xff is the max. codepoint value that can currently be expressed, which doesn't allow matching the full range of Unicode characters, I suggest the following:
So if an awk program file requires such (UTF-8 encoded) characters: is your editor capable of handling the characters you need, and is the keyboard (and the skill with it) you have sufficient for typing it? Also consider what happens if a not-so-capable editor is handling such a file. - The various escapes for typing a character may be the only way to guarantee portability without being stymied by the ubiquitious � - haven't we all seen it?

Best
Wolfgang

reply via email to

[Prev in Thread] Current Thread [Next in Thread]