[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Possible printf %c width multi-byte bug
From: |
Aharon Robbins |
Subject: |
Re: [bug-gawk] Possible printf %c width multi-byte bug |
Date: |
Tue, 02 Jul 2013 22:30:50 +0300 |
User-agent: |
Heirloom mailx 12.5 6/20/10 |
Hi.
> Date: Sun, 30 Jun 2013 17:32:57 +0200
> From: Nethox <address@hidden>
> To: Aharon Robbins <address@hidden>
> CC: address@hidden
> Subject: Re: [bug-gawk] Possible printf %c width multi-byte bug
>
> Aharon Robbins, 2013-06-28 12:26:
> > I believe you are correct. I think that the patch below fixes things.
> >
> > Please try it out and see if it breaks anything. I don't think it does.
> >
> > Thanks,
> >
> > Arnold
> > ---------------------------------------------
> > ..............
>
> The printf %c padding seems fixed, and I have not detected regressions.
> All tests from "cd test; make check" pass (MPFR not supported on this
> system).
> Patch applied to GAWK 4.1.0, API 1.0, from gawk-4.1.0.tar.gz .
>
> However, I have found another related bug of printf and sprintf %c which
> also affects the unpatched version. Triggering conditions:
> - The argument string has several characters.
> - The first one is multi-byte (it does not matter whether the following
> ones are ASCII or also multi-byte).
> - Width is passed (even width=1, which should not change the output at all).
>
> What happens is that 2 chars from the input string are cut, instead of
> just the 1st one. The potential padding is affected by the patch, so
> here is the equivalent behaviour table:
> | Pre-patch | Post-patch
> ---------+-------------------+-------------------
> Cutting | substr(s, 1, 2) | substr(s, 1, 2)
> Padding | substr(s, 1, 2) | substr(s, 1, 1)
>
> The printf %s behaviour remains as the correct reference for all cases.
> A more complete unit test is attached, following the test/README
> conventions: AWK script, input, correct ouput, actual 4.1.0 output, and
> actual 4.1.0-patch output. Alter it as needed, maybe it is too
> human-readable.
Thanks for the report and the test files. With the patch below, it passes.
I assume it covers the original report as well?
> I tried changing ENVIRON["LC_ALL"] from the script itself, but the
> printf function was unaffected, so the script must be manually called by
> prepending LC_ALL="C.UTF-8", or I guess with a new specific make target.
That is correct. Changing ENVIRON within the awk program does not affect
the locale stuff; it's set at startup.
Here is the new patch, relative to the gawk-4.1-stable branch in git.
It includes a modified version of the first patch.
Once you approve, I'll push this.
Thanks,
Arnold
-------------------------------------------------------
diff --git a/builtin.c b/builtin.c
index ba1d8dc..b8e24cb 100644
--- a/builtin.c
+++ b/builtin.c
@@ -1097,6 +1097,7 @@ out0:
* used to work? 6/2003.)
*/
cp = arg->stptr;
+ prec = 1;
#if MBS_SUPPORT
/*
* First character can be multiple bytes if
@@ -1108,17 +1109,14 @@ out0:
memset(& state, 0, sizeof(state));
count = mbrlen(cp, arg->stlen, & state);
- if (count == 0
- || count == (size_t)-1
- || count == (size_t)-2)
- goto out2;
- prec = count;
- goto pr_tail;
+ if (count > 0) {
+ prec = count;
+ /* may need to increase fw so that
padding happens, see pr_tail code */
+ if (fw > 0)
+ fw += count - 1;
+ }
}
-out2:
- ;
#endif
- prec = 1;
goto pr_tail;
case 's':
need_format = false;
@@ -1421,9 +1419,14 @@ mpf1:
copy_count = prec;
if (fw == 0 && ! have_prec)
;
- else if (gawk_mb_cur_max > 1 && (cs1 == 's' || cs1 ==
'c')) {
- assert(cp == arg->stptr || cp == cpbuf);
- copy_count = mbc_byte_count(arg->stptr, prec);
+ else if (gawk_mb_cur_max > 1) {
+ if (cs1 == 's') {
+ assert(cp == arg->stptr || cp == cpbuf);
+ copy_count = mbc_byte_count(arg->stptr,
prec);
+ }
+ /* prec was set by code for %c */
+ /* else
+ copy_count = prec; */
}
bchunk(cp, copy_count);
while (fw > prec) {
- Re: [bug-gawk] Possible printf %c width multi-byte bug,
Aharon Robbins <=