[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Possible printf %c width multi-byte bug
From: |
Aharon Robbins |
Subject: |
Re: [bug-gawk] Possible printf %c width multi-byte bug |
Date: |
Fri, 28 Jun 2013 13:26:08 +0300 |
User-agent: |
Heirloom mailx 12.5 6/20/10 |
Hi. I finally popped my stack back far enough to look at this.
> Date: Fri, 10 May 2013 04:06:37 +0200
> From: Nethox <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] Possible printf %c width multi-byte bug
>
> I am not sure if the following is a bug or intended behaviour. But I
> find gawk's printf %c and %s inconsistent when width is specified and
> multi-byte encoding (UTF-8) is used.
>
> Test program:
> BEGIN { printf "%2c\n", "??" }
> Versions:
> GNU Awk 4.0.75, API: 0.0
> GNU Awk 4.0.1
> mawk 1.3.3 Nov 1996
>
> Command line |Output
> --------------------------------------------------------+------
> LC_ALL=C.UTF-8 gawk 'BEGIN { printf "%2c\n", "??" }' |?? <-- ???
>
> ....
>
> The problem I see is in the first command with %c, where I expected:
> LC_ALL=C.UTF-8 gawk 'BEGIN { printf "%2c\n", "??" }' | ??
>
> Which would be consistent with the padding behaviour of gawk's printf
> %2s, length() and other functions which count chars and not bytes when
> the user locale is UTF-8, and also with the man page:
I believe you are correct. I think that the patch below fixes things.
Please try it out and see if it breaks anything. I don't think it does.
Thanks,
Arnold
---------------------------------------------
diff --git a/builtin.c b/builtin.c
index ba1d8dc..5e98c81 100644
--- a/builtin.c
+++ b/builtin.c
@@ -1113,6 +1113,9 @@ out0:
|| count == (size_t)-2)
goto out2;
prec = count;
+ /* may need to increase fw so that padding
happens, see pr_tail code */
+ if (fw > 0)
+ fw += count - 1;
goto pr_tail;
}
out2:
- Re: [bug-gawk] Possible printf %c width multi-byte bug,
Aharon Robbins <=