[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
c32width gives incorrect return values in C locale
From: |
Gavin Smith |
Subject: |
c32width gives incorrect return values in C locale |
Date: |
Sat, 11 Nov 2023 18:53:21 +0000 |
On Fri, Nov 10, 2023 at 07:39:43PM +0000, Gavin Smith wrote:
> Is the expected output
>
> å å (å) Å Å (Å) æ æ (æ) œ œ (œ) Æ Æ (Æ) Œ Œ (Œ) ø ø (ø) Ø Ø (Ø) ß ß (ß)
>
> (width 74) or
>
> @aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø) @O Ø
> (Ø) @ss ß (ß)
>
> (width 90)?
>
> I guess you will need to look at the Unicode characters that you pass to
> c32width,
> and whether you get return values < 1 for some of them.
It is locale-dependent!
It looks like c32width is simply being redirected to wcwidth which then
doesn't work properly with LC_ALL=C. This is from the gnulib module
c32width.
I don't know if there is an easy way to make a self-contained example
to show the difference, because it needs all the gnulib Makefile machinery,
but the difference shows up for any non-ASCII character. If I add a line
like
fprintf (stderr, "width of [%4.0lx] is %d (remaining %s)\n",
(long) wc, width, q);
in the right place in the code, where width is the result of c32width,
then the output looks like
width of [ 40] is 1 (remaining @)
width of [ 4f] is 1 (remaining OE )
width of [ 45] is 1 (remaining E )
width of [ 152] is -1 (remaining Œ)
width of [ 28] is 1 (remaining (Œ)
for LC_ALL=C, but
width of [ 40] is 1 (remaining @)
width of [ 4f] is 1 (remaining OE )
width of [ 45] is 1 (remaining E )
width of [ 152] is 1 (remaining Œ)
width of [ 28] is 1 (remaining (Œ)
otherwise (LC_ALL=en_GB.UTF-8).
Should this be reported as a bug to bug-gnulib or bug-libunistring?
In the context of the input from the test, the following is the contents
of a a simplified test file "test.texi":
@@aa @aa{} (å)
@@AA @AA{} (Å)
@@ae @ae{} (æ)
@@oe @oe{} (œ)
@@AE @AE{} (Æ)
@@OE @OE{} (Œ)
@@o @o{} (ø)
@@O @O{} (Ø)
@@ss @ss{} (ß)
@@l @l{} (ł)
@@L @L{} (Ł)
@@DH @DH{} (Ð)
@@TH @TH{} (Þ)
@@dh @dh{} (ð)
@@th @th{} (þ)
Then, in a UTF-8 locale:
$ ../tp/texi2any.pl test.texi && cat test.info
test.texi: warning: document without nodes
This is test.info, produced by texi2any version 7.1dev+dev from
test.texi.
@aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø) @O
Ø (Ø) @ss ß (ß) @l ł (ł) @L Ł (Ł) @DH Ð (Ð) @TH Þ (Þ) @dh ð (ð) @th þ
(þ)
Tag Table:
End Tag Table
Local Variables:
coding: utf-8
End:
However:
$ LC_ALL=C ../tp/texi2any.pl test.texi && cat test.info
test.texi: warning: document without nodes
This is test.info, produced by texi2any version 7.1dev+dev from
test.texi.
@aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø) @O Ø (Ø)
@ss ß (ß) @l
ł(ł) @L Ł (Ł) @DH Ð (Ð) @TH Þ (Þ) @dh ð (ð) @th þ (þ)
Tag Table:
End Tag Table
Local Variables:
coding: utf-8
End:
In the later case, it is a much longer line.
- Re: Locale-independent paragraph formatting, Gavin Smith, 2023/11/09
- Fwd: Locale-independent paragraph formatting, Gavin Smith, 2023/11/10
- c32width gives incorrect return values in C locale,
Gavin Smith <=
- Re: c32width gives incorrect return values in C locale, Bruno Haible, 2023/11/11
- Re: c32width gives incorrect return values in C locale, Gavin Smith, 2023/11/11
- Re: c32width gives incorrect return values in C locale, Bruno Haible, 2023/11/11
- Re: c32width gives incorrect return values in C locale, Eli Zaretskii, 2023/11/12
- Re: c32width gives incorrect return values in C locale, Gavin Smith, 2023/11/12
- Re: c32width gives incorrect return values in C locale, Patrice Dumas, 2023/11/13
- Re: c32width gives incorrect return values in C locale, Gavin Smith, 2023/11/13
- Re: c32width gives incorrect return values in C locale, Paul Eggert, 2023/11/15
- Re: c32width gives incorrect return values in C locale, Patrice Dumas, 2023/11/15
- Re: c32width gives incorrect return values in C locale, Gavin Smith, 2023/11/18