bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Exposing wcwidth(3) as a built-in function


From: Eric Pruitt
Subject: Re: [bug-gawk] Exposing wcwidth(3) as a built-in function
Date: Tue, 28 Nov 2017 16:03:59 -0800
User-agent: NeoMutt/20170113 (1.7.2)

On Tue, Nov 28, 2017 at 02:33:20PM -0800, Eric Pruitt wrote:
> As far as I can tell, GAWK does not expose wcwidth(3) which surprises me
> since AWK specializes in processing text. I know I could implement this
> myself using an extension, but has exposing wcwidth(3) as a built-in
> function been considered in the past? If so, is there any particular
> reason it was rejected / not implemented?

I generated a table consisting of character widths and two values
representing the beginning & end of a Unicode codepoint ranges which I
translated into an AWK function. At ~2,000 lines, the code is small
enough that parsing it doesn't add any noticeable latency on my machine.
Unfortunately the function is fairly useless because there isn't a way
to efficiently get the numeric codepoint of a character. The code in
https://www.gnu.org/software/gawk/manual/html_node/Ordinal-Functions.html
("Translating Between Characters and Numbers") uses an array as a lookup
table. The documentation reads "Both functions are written very nicely
in awk; there is no real reason to build them into the awk interpreter"
but creating a lookup table that spans all of the Unicode codepoints
takes a non-trivial amount of time.

Eric



reply via email to

[Prev in Thread] Current Thread [Next in Thread]