Re: [bug-gawk] Exposing wcwidth(3) as a built-in function

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Exposing wcwidth(3) as a built-in function

From:	Eric Pruitt
Subject:	Re: [bug-gawk] Exposing wcwidth(3) as a built-in function
Date:	Fri, 8 Dec 2017 15:25:34 -0800
User-agent:	NeoMutt/20170113 (1.7.2)

On Fri, Dec 01, 2017 at 09:41:14PM +0200, Eli Zaretskii wrote:
> > I decided to make a less hacky, portable version of the wcwidth
> > function. The rewritten script is much smaller, and it doesn't kill the
> > MAWK parser like its predecessor. Lookups are done using a binary search
> > on a table that is lazy-loaded at runtime. I've attached the updated
> > script to this email, but the canonical repository is
> > https://github.com/ericpruitt/wcwidth.awk .
>
> Thanks, but doesn't this still assume UTF-8 encoding of characters?
> If so, it's not portable to non-UTF-8 locales, right?

I realized I may've misinterpreted your question, so I will clarify and
add a question of my own: only the code for interpreters that are not
multi-byte safe falls back to manual UTF-8 parsing. This means that in
GAWK, the lookup table uses lexical comparisons assuming the locale is
multi-byte safe. In MAWK, however, the lookup table was* indexed by
numerical code points. Are there some multi-byte locales where I could
not count on sprintf("%c", 23485) being "宽" in GNU Awk? From running
"fgrep -ir iconv --include '*.h' --include '*.c'", it doesn't look like
GAWK uses iconv. Perhaps a more accurate question is, will GAWK work on
platforms that do not have **any** Unicode support (be it UTF-8, UTF-16,
etc.)?

* I have since rewritten the code for multi-byte unsafe interpreters so
  the lookup table is indexed by UTF-8 byte strings instead of numeric
  code points for performance reasons.

Eric

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [bug-gawk] Exposing wcwidth(3) as a built-in function, Eric Pruitt, 2017/12/01
- Re: [bug-gawk] Exposing wcwidth(3) as a built-in function, Eli Zaretskii, 2017/12/01
  - Re: [bug-gawk] Exposing wcwidth(3) as a built-in function, Eric Pruitt, 2017/12/01
    - Re: [bug-gawk] Exposing wcwidth(3) as a built-in function, Eli Zaretskii, 2017/12/01
    - Re: [bug-gawk] Exposing wcwidth(3) as a built-in function, Eric Pruitt, 2017/12/01
  - Re: [bug-gawk] Exposing wcwidth(3) as a built-in function, Eric Pruitt <=
    - Re: [bug-gawk] Exposing wcwidth(3) as a built-in function, Eli Zaretskii, 2017/12/09
    - Re: [bug-gawk] Exposing wcwidth(3) as a built-in function, Eric Pruitt, 2017/12/09
    - Re: [bug-gawk] Exposing wcwidth(3) as a built-in function, arnold, 2017/12/09
- Re: [bug-gawk] Exposing wcwidth(3) as a built-in function, arnold, 2017/12/03

Prev by Date: Re: [bug-gawk] gawk failing on openbsd
Next by Date: Re: [bug-gawk] Exposing wcwidth(3) as a built-in function
Previous by thread: Re: [bug-gawk] Exposing wcwidth(3) as a built-in function
Next by thread: Re: [bug-gawk] Exposing wcwidth(3) as a built-in function
Index(es):
- Date
- Thread