bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Small printf issue


From: arnold
Subject: Re: [bug-gawk] Small printf issue
Date: Fri, 16 Jun 2017 00:46:14 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Hi.

I see this under Linux. That's the good news. Andy made some changes
in this area a while back, which might be the reason; the 3 byte case
doesn't happen in gawk-4.1-stable.

I'll try to look at it. Andy - if you can also it'd help.

Thanks,

Arnold

Hermann Peifer <address@hidden> wrote:

> Hi,
>
>
> I noted a small printf issue in a case where $1 was a Unicode bullet
> character (U+2022), UTF8-encoded as a 3-byte sequence: 0xE2 0x80 0xA2.
> The observation was that the first character of $2 gets repeated,
> somehow. Below a small example. I'm using gawk/master on macOS, my
> locale is en_US.UTF-8. The issue disappears when using: LC_ALL=C gawk '...'
>
>
> Hermann
>
>
> # Umlauts (2 bytes only) seem to be OK
>
> $ printf "\xC3\x96 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
> ??    ABC
>
>
> # 1 repeated character where $1 is a 3-bytebullet character (U+2022)
>
> $ printf "\xE2\x80\xA2 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
> ??? A  ABC
>
>
> # 2 repeated characters where $1 is a 4-byteair symbol (U+1F701)
>
> $ printf "\xF0\x9F\x9C\x81 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
> ???? AB ABC
>
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]