bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Small printf issue


From: arnold
Subject: Re: [bug-gawk] Small printf issue
Date: Fri, 16 Jun 2017 03:57:37 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Hi.

Thanks for the report and test case.  The patch is below.  I will
push it shortly, once I've added the test case into the test suite.

Thanks,

Arnold

Hermann Peifer <address@hidden> wrote:

> Hi,
>
>
> I noted a small printf issue in a case where $1 was a Unicode bullet
> character (U+2022), UTF8-encoded as a 3-byte sequence: 0xE2 0x80 0xA2.
> The observation was that the first character of $2 gets repeated,
> somehow. Below a small example. I'm using gawk/master on macOS, my
> locale is en_US.UTF-8. The issue disappears when using: LC_ALL=C gawk '...'
>
>
> Hermann
>
>
> # Umlauts (2 bytes only) seem to be OK
>
> $ printf "\xC3\x96 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
> ??    ABC
>
>
> # 1 repeated character where $1 is a 3-bytebullet character (U+2022)
>
> $ printf "\xE2\x80\xA2 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
> ??? A  ABC
>
>
> # 2 repeated characters where $1 is a 4-byteair symbol (U+1F701)
>
> $ printf "\xF0\x9F\x9C\x81 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
> ???? AB ABC
>
>
>
-----------------------------------------------
diff --git a/builtin.c b/builtin.c
index 87d9dcb..724be05 100644
--- a/builtin.c
+++ b/builtin.c
@@ -4152,12 +4152,13 @@ mbc_char_count(const char *ptr, size_t numbytes)
        if (mb_len <= 0)
                return numbytes;        /* no valid m.b. char */
 
-       for (; numbytes > 0; numbytes--) {
+       while (numbytes > 0) {
                mb_len = mbrlen(ptr, numbytes, &cur_state);
                if (mb_len <= 0)
                        break;
                sum++;
                ptr += mb_len;
+               numbytes -= mb_len;
        }
 
        return sum;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]