bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] Small printf issue


From: Hermann Peifer
Subject: [bug-gawk] Small printf issue
Date: Thu, 15 Jun 2017 20:27:06 +0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

Hi,


I noted a small printf issue in a case where $1 was a Unicode bullet
character (U+2022), UTF8-encoded as a 3-byte sequence: 0xE2 0x80 0xA2.
The observation was that the first character of $2 gets repeated,
somehow. Below a small example. I'm using gawk/master on macOS, my
locale is en_US.UTF-8. The issue disappears when using: LC_ALL=C gawk '...'


Hermann


# Umlauts (2 bytes only) seem to be OK

$ printf "\xC3\x96 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
Ö    ABC


# 1 repeated character where $1 is a 3-bytebullet character (U+2022)

$ printf "\xE2\x80\xA2 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
• A  ABC


# 2 repeated characters where $1 is a 4-byteair symbol (U+1F701)

$ printf "\xF0\x9F\x9C\x81 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
🜁 AB ABC





reply via email to

[Prev in Thread] Current Thread [Next in Thread]