bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Possible printf %c width multi-byte bug


From: Nethox
Subject: Re: [bug-gawk] Possible printf %c width multi-byte bug
Date: Sun, 30 Jun 2013 17:32:57 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130519 Icedove/17.0.5

Aharon Robbins,  2013-06-28 12:26:
> I believe you are correct. I think that the patch below fixes things.
> 
> Please try it out and see if it breaks anything. I don't think it does.
> 
> Thanks,
> 
> Arnold
> ---------------------------------------------
> diff --git a/builtin.c b/builtin.c
> index ba1d8dc..5e98c81 100644
> --- a/builtin.c
> +++ b/builtin.c
> @@ -1113,6 +1113,9 @@ out0:
>                                   || count == (size_t)-2)
>                                       goto out2;
>                               prec = count;
> +                             /* may need to increase fw so that padding 
> happens, see pr_tail code */
> +                             if (fw > 0)
> +                                     fw += count - 1;
>                               goto pr_tail;
>                       }
>  out2:
> 

The printf %c padding seems fixed, and I have not detected regressions.
All tests from "cd test; make check" pass (MPFR not supported on this
system).
Patch applied to GAWK 4.1.0, API 1.0, from gawk-4.1.0.tar.gz .

However, I have found another related bug of printf and sprintf %c which
also affects the unpatched version. Triggering conditions:
- The argument string has several characters.
- The first one is multi-byte (it does not matter whether the following
ones are ASCII or also multi-byte).
- Width is passed (even width=1, which should not change the output at all).

What happens is that 2 chars from the input string are cut, instead of
just the 1st one. The potential padding is affected by the patch, so
here is the equivalent behaviour table:
         |  Pre-patch        |  Post-patch
---------+-------------------+-------------------
Cutting  |  substr(s, 1, 2)  |  substr(s, 1, 2)
Padding  |  substr(s, 1, 2)  |  substr(s, 1, 1)

Example 1:
printf "|%3c|\n", "último"
|  úl|
instead of the correct |  ú|

Example 2:
printf "|%1c|\n", "último"
|úl|
instead of the correct |ú|

The printf %s behaviour remains as the correct reference for all cases.
A more complete unit test is attached, following the test/README
conventions: AWK script, input, correct ouput, actual 4.1.0 output, and
actual 4.1.0-patch output. Alter it as needed, maybe it is too
human-readable.
I tried changing ENVIRON["LC_ALL"] from the script itself, but the
printf function was unaffected, so the script must be manually called by
prepending LC_ALL="C.UTF-8", or I guess with a new specific make target.

Regards.

Attachment: _mbprintf4_4.1.0
Description: Text document

Attachment: _mbprintf4_4.1.0-patched
Description: Text document

Attachment: mbprintf4.awk
Description: application/awk

Attachment: mbprintf4.in
Description: Text document

Attachment: mbprintf4.ok
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]