bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Exposing wcwidth(3) as a built-in function


From: Eric Pruitt
Subject: Re: [bug-gawk] Exposing wcwidth(3) as a built-in function
Date: Tue, 28 Nov 2017 23:34:30 -0800
User-agent: NeoMutt/20170113 (1.7.2)

On Tue, Nov 28, 2017 at 04:03:59PM -0800, Eric Pruitt wrote:
> I generated a table consisting of character widths and two values
> representing the beginning & end of a Unicode codepoint ranges which I
> translated into an AWK function. At ~2,000 lines, the code is small
> enough that parsing it doesn't add any noticeable latency on my machine.
> Unfortunately the function is fairly useless because there isn't a way
> to efficiently get the numeric codepoint of a character. The code in
> https://www.gnu.org/software/gawk/manual/html_node/Ordinal-Functions.html
> ("Translating Between Characters and Numbers") uses an array as a lookup
> table. The documentation reads "Both functions are written very nicely
> in awk; there is no real reason to build them into the awk interpreter"
> but creating a lookup table that spans all of the Unicode codepoints
> takes a non-trivial amount of time.

I was able to create a working wcwidth function in pure AWK script
without converting characters to numeric codepoints by (ab)using lexical
comparisons. I have no clue how portable this is. I've attached the
generated file in the event that it might be useful to someone else.

Eric

Attachment: wcwidth.awk
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]