[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gawk] Small printf issue
From: |
Hermann Peifer |
Subject: |
[bug-gawk] Small printf issue |
Date: |
Thu, 15 Jun 2017 20:27:06 +0200 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 |
Hi,
I noted a small printf issue in a case where $1 was a Unicode bullet
character (U+2022), UTF8-encoded as a 3-byte sequence: 0xE2 0x80 0xA2.
The observation was that the first character of $2 gets repeated,
somehow. Below a small example. I'm using gawk/master on macOS, my
locale is en_US.UTF-8. The issue disappears when using: LC_ALL=C gawk '...'
Hermann
# Umlauts (2 bytes only) seem to be OK
$ printf "\xC3\x96 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
Ö ABC
# 1 repeated character where $1 is a 3-bytebullet character (U+2022)
$ printf "\xE2\x80\xA2 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
• A ABC
# 2 repeated characters where $1 is a 4-byteair symbol (U+1F701)
$ printf "\xF0\x9F\x9C\x81 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
🜁 AB ABC
- [bug-gawk] Small printf issue,
Hermann Peifer <=