bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Possible printf %c width multi-byte bug


From: Aharon Robbins
Subject: Re: [bug-gawk] Possible printf %c width multi-byte bug
Date: Tue, 02 Jul 2013 22:30:50 +0300
User-agent: Heirloom mailx 12.5 6/20/10

Hi.

> Date: Sun, 30 Jun 2013 17:32:57 +0200
> From: Nethox <address@hidden>
> To: Aharon Robbins <address@hidden>
> CC: address@hidden
> Subject: Re: [bug-gawk] Possible printf %c width multi-byte bug
>
> Aharon Robbins,  2013-06-28 12:26:
> > I believe you are correct. I think that the patch below fixes things.
> > 
> > Please try it out and see if it breaks anything. I don't think it does.
> > 
> > Thanks,
> > 
> > Arnold
> > ---------------------------------------------
> > ..............
>
> The printf %c padding seems fixed, and I have not detected regressions.
> All tests from "cd test; make check" pass (MPFR not supported on this
> system).
> Patch applied to GAWK 4.1.0, API 1.0, from gawk-4.1.0.tar.gz .
>
> However, I have found another related bug of printf and sprintf %c which
> also affects the unpatched version. Triggering conditions:
> - The argument string has several characters.
> - The first one is multi-byte (it does not matter whether the following
> ones are ASCII or also multi-byte).
> - Width is passed (even width=1, which should not change the output at all).
>
> What happens is that 2 chars from the input string are cut, instead of
> just the 1st one. The potential padding is affected by the patch, so
> here is the equivalent behaviour table:
>          |  Pre-patch        |  Post-patch
> ---------+-------------------+-------------------
> Cutting  |  substr(s, 1, 2)  |  substr(s, 1, 2)
> Padding  |  substr(s, 1, 2)  |  substr(s, 1, 1)
>
> The printf %s behaviour remains as the correct reference for all cases.
> A more complete unit test is attached, following the test/README
> conventions: AWK script, input, correct ouput, actual 4.1.0 output, and
> actual 4.1.0-patch output. Alter it as needed, maybe it is too
> human-readable.

Thanks for the report and the test files. With the patch below, it passes.
I assume it covers the original report as well?

> I tried changing ENVIRON["LC_ALL"] from the script itself, but the
> printf function was unaffected, so the script must be manually called by
> prepending LC_ALL="C.UTF-8", or I guess with a new specific make target.

That is correct. Changing ENVIRON within the awk program does not affect
the locale stuff; it's set at startup.

Here is the new patch, relative to the gawk-4.1-stable branch in git.
It includes a modified version of the first patch.

Once you approve, I'll push this.

Thanks,

Arnold
-------------------------------------------------------
diff --git a/builtin.c b/builtin.c
index ba1d8dc..b8e24cb 100644
--- a/builtin.c
+++ b/builtin.c
@@ -1097,6 +1097,7 @@ out0:
                         * used to work? 6/2003.)
                         */
                        cp = arg->stptr;
+                       prec = 1;
 #if MBS_SUPPORT
                        /*
                         * First character can be multiple bytes if
@@ -1108,17 +1109,14 @@ out0:
 
                                memset(& state, 0, sizeof(state));
                                count = mbrlen(cp, arg->stlen, & state);
-                               if (count == 0
-                                   || count == (size_t)-1
-                                   || count == (size_t)-2)
-                                       goto out2;
-                               prec = count;
-                               goto pr_tail;
+                               if (count > 0) {
+                                       prec = count;
+                                       /* may need to increase fw so that 
padding happens, see pr_tail code */
+                                       if (fw > 0)
+                                               fw += count - 1;
+                               }
                        }
-out2:
-                       ;
 #endif
-                       prec = 1;
                        goto pr_tail;
                case 's':
                        need_format = false;
@@ -1421,9 +1419,14 @@ mpf1:
                        copy_count = prec;
                        if (fw == 0 && ! have_prec)
                                ;
-                       else if (gawk_mb_cur_max > 1 && (cs1 == 's' || cs1 == 
'c')) {
-                               assert(cp == arg->stptr || cp == cpbuf);
-                               copy_count = mbc_byte_count(arg->stptr, prec);
+                       else if (gawk_mb_cur_max > 1) {
+                               if (cs1 == 's') {
+                                       assert(cp == arg->stptr || cp == cpbuf);
+                                       copy_count = mbc_byte_count(arg->stptr, 
prec);
+                               }
+                               /* prec was set by code for %c */
+                               /* else
+                                       copy_count = prec; */
                        }
                        bchunk(cp, copy_count);
                        while (fw > prec) {



reply via email to

[Prev in Thread] Current Thread [Next in Thread]