bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Small printf issue


From: arnold
Subject: Re: [bug-gawk] Small printf issue
Date: Fri, 16 Jun 2017 01:31:06 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Hi.

So, it wasn't what I thought, but in any case, git bisect says this is
the first bad commit:

| commit c86137f472fdf876c2c223c8d99f673f477b7554
| Author: Andrew J. Schorr <address@hidden>
| Date:   Fri Jul 8 15:26:00 2016 -0400
|
|     Optimization: support unterminated field strings inside gawk, but make 
terminated copies for the API.

Andy, please take a look.

I will also take a poke at it with a debugger. This will probably take
a few days before I can really work on it.

Thanks,

Arnold


address@hidden wrote:

> Hi.
>
> I see this under Linux. That's the good news. Andy made some changes
> in this area a while back, which might be the reason; the 3 byte case
> doesn't happen in gawk-4.1-stable.
>
> I'll try to look at it. Andy - if you can also it'd help.
>
> Thanks,
>
> Arnold
>
> Hermann Peifer <address@hidden> wrote:
>
> > Hi,
> >
> >
> > I noted a small printf issue in a case where $1 was a Unicode bullet
> > character (U+2022), UTF8-encoded as a 3-byte sequence: 0xE2 0x80 0xA2.
> > The observation was that the first character of $2 gets repeated,
> > somehow. Below a small example. I'm using gawk/master on macOS, my
> > locale is en_US.UTF-8. The issue disappears when using: LC_ALL=C gawk '...'
> >
> >
> > Hermann
> >
> >
> > # Umlauts (2 bytes only) seem to be OK
> >
> > $ printf "\xC3\x96 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
> > ??    ABC
> >
> >
> > # 1 repeated character where $1 is a 3-bytebullet character (U+2022)
> >
> > $ printf "\xE2\x80\xA2 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
> > ??? A  ABC
> >
> >
> > # 2 repeated characters where $1 is a 4-byteair symbol (U+1F701)
> >
> > $ printf "\xF0\x9F\x9C\x81 ABC\n" | gawk '{printf "%-5s%s\n", $1, $2}'
> > ???? AB ABC
> >
> >
> >



reply via email to

[Prev in Thread] Current Thread [Next in Thread]