[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Possible printf %c width multi-byte bug
From: |
Aharon Robbins |
Subject: |
Re: [bug-gawk] Possible printf %c width multi-byte bug |
Date: |
Fri, 10 May 2013 11:30:32 +0300 |
User-agent: |
Heirloom mailx 12.5 6/20/10 |
Hi. I'm not sure why, but I received three copies of your note.
> From: Nethox <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] Possible printf %c width multi-byte bug
>
> I am not sure if the following is a bug or intended behaviour. But I
> find gawk's printf %c and %s inconsistent when width is specified and
> multi-byte encoding (UTF-8) is used.
>
>
> Test program:
> BEGIN { printf "%2c\n", "??" }
> ....
The short answer is "this whole business is a mess".
I did not find the POSIX standard to be super clear on this point. OTOH,
it would probably not hurt to spend some time digging and langage
lawyering with the standard to try to figure things out a little more.
Things are complicated because all input and output use multibyte
encodings whereas wide characters are simply large numerical values.
In any case, the upcoming 4.1 release is in code freeze. After it's
released I will try to spend some time reading the standard and also
stepping through the various cases with a debugger.
I suspect that no matter what I do, it will be wrong for some corner case.
Thanks,
Arnold