|
From: | Nethox |
Subject: | Re: [bug-gawk] Possible printf %c width multi-byte bug |
Date: | Sun, 30 Jun 2013 17:32:57 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130519 Icedove/17.0.5 |
Aharon Robbins, 2013-06-28 12:26: > I believe you are correct. I think that the patch below fixes things. > > Please try it out and see if it breaks anything. I don't think it does. > > Thanks, > > Arnold > --------------------------------------------- > diff --git a/builtin.c b/builtin.c > index ba1d8dc..5e98c81 100644 > --- a/builtin.c > +++ b/builtin.c > @@ -1113,6 +1113,9 @@ out0: > || count == (size_t)-2) > goto out2; > prec = count; > + /* may need to increase fw so that padding > happens, see pr_tail code */ > + if (fw > 0) > + fw += count - 1; > goto pr_tail; > } > out2: > The printf %c padding seems fixed, and I have not detected regressions. All tests from "cd test; make check" pass (MPFR not supported on this system). Patch applied to GAWK 4.1.0, API 1.0, from gawk-4.1.0.tar.gz . However, I have found another related bug of printf and sprintf %c which also affects the unpatched version. Triggering conditions: - The argument string has several characters. - The first one is multi-byte (it does not matter whether the following ones are ASCII or also multi-byte). - Width is passed (even width=1, which should not change the output at all). What happens is that 2 chars from the input string are cut, instead of just the 1st one. The potential padding is affected by the patch, so here is the equivalent behaviour table: | Pre-patch | Post-patch ---------+-------------------+------------------- Cutting | substr(s, 1, 2) | substr(s, 1, 2) Padding | substr(s, 1, 2) | substr(s, 1, 1) Example 1: printf "|%3c|\n", "último" | úl| instead of the correct | ú| Example 2: printf "|%1c|\n", "último" |úl| instead of the correct |ú| The printf %s behaviour remains as the correct reference for all cases. A more complete unit test is attached, following the test/README conventions: AWK script, input, correct ouput, actual 4.1.0 output, and actual 4.1.0-patch output. Alter it as needed, maybe it is too human-readable. I tried changing ENVIRON["LC_ALL"] from the script itself, but the printf function was unaffected, so the script must be manually called by prepending LC_ALL="C.UTF-8", or I guess with a new specific make target. Regards.
_mbprintf4_4.1.0
Description: Text document
_mbprintf4_4.1.0-patched
Description: Text document
mbprintf4.awk
Description: application/awk
mbprintf4.in
Description: Text document
mbprintf4.ok
Description: Text document
[Prev in Thread] | Current Thread | [Next in Thread] |