[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Exposing wcwidth(3) as a built-in function
From: |
Eric Pruitt |
Subject: |
Re: [bug-gawk] Exposing wcwidth(3) as a built-in function |
Date: |
Tue, 28 Nov 2017 23:34:30 -0800 |
User-agent: |
NeoMutt/20170113 (1.7.2) |
On Tue, Nov 28, 2017 at 04:03:59PM -0800, Eric Pruitt wrote:
> I generated a table consisting of character widths and two values
> representing the beginning & end of a Unicode codepoint ranges which I
> translated into an AWK function. At ~2,000 lines, the code is small
> enough that parsing it doesn't add any noticeable latency on my machine.
> Unfortunately the function is fairly useless because there isn't a way
> to efficiently get the numeric codepoint of a character. The code in
> https://www.gnu.org/software/gawk/manual/html_node/Ordinal-Functions.html
> ("Translating Between Characters and Numbers") uses an array as a lookup
> table. The documentation reads "Both functions are written very nicely
> in awk; there is no real reason to build them into the awk interpreter"
> but creating a lookup table that spans all of the Unicode codepoints
> takes a non-trivial amount of time.
I was able to create a working wcwidth function in pure AWK script
without converting characters to numeric codepoints by (ab)using lexical
comparisons. I have no clue how portable this is. I've attached the
generated file in the event that it might be useful to someone else.
Eric
wcwidth.awk
Description: Text document