[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wcwidth replacement problems
From: |
Bruno Haible |
Subject: |
Re: wcwidth replacement problems |
Date: |
Tue, 26 Aug 2008 09:32:32 +0200 |
User-agent: |
KMail/1.5.4 |
Alexander V. Lukyanov wrote:
> > (Giving the BULLET a width of 2 is a bit strange, but not really wrong.)
>
> Well, it does not seem to match current xterm behavior, and thus leads to
> strange visual results. I don't know, maybe it is an xterm problem, but the
> easiest way was to substitute wcwidth.
Probably the Solaris wcwidth is made to match some Japanese terminal
emulators, rather than xterm? In such terminal emulators, many characters
that have width 1 in xterm are represented with width 2.
U+2022 (BULLET) is designated as "ambiguous width" in Unicode 5.0.0
(ftp.unicode.org ArchiveVersions/5.0.0/ucd/extracted/DerivedEastAsianWidth.txt)
therefore I don't want to consider Solaris wrong here. You have to understand
that wcwidth is only an approximation because different terminal emulators
behave differently.
> > > BTW, why not use this one: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c ?
> > > It's public domain.
> >
> > It has also its bugs [1]. Additionally, it's slower because it uses binary
> > search rather than immediate table accesses.
>
> Let's measure it.
>
> $ time ./wcwidth-solaris
> wcwidth(0x2022)=2
>
> real 0m2.205s
> user 0m2.200s
> sys 0m0.000s
>
> $ time ./wcwidth-rpl
> wcwidth(0x2022)=1
>
> real 0m55.477s
> user 0m55.350s
> sys 0m0.000s
>
> $ time ./wcwidth-mk
> wcwidth(0x2022)=1
>
> real 0m1.944s
> user 0m1.940s
> sys 0m0.010s
This is not a fair comparison: wcwidth-mk works only in UTF-8 locales,
whereas wcwidth() from the system and from gnulib return the right result
in all locales. The test whether the locale encoding is UTF-8 is precisely
what takes up most time in the gnulib replacement.
Bruno