bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] GAWK for Windows does not work properly with UTF-8


From: Marc de Bourget
Subject: [bug-gawk] GAWK for Windows does not work properly with UTF-8
Date: Thu, 11 Feb 2016 12:46:21 +0100

I use this version:
http://sourceforge.net/projects/ezwinports/files/gawk-4.1.3-w32-bin.zip/download

Problem: This GAWK for Windows version counts bytes instead of characters.
Céline has 6 characters but 7 bytes due tu the multibyte character "é".

The length function for the string "Céline" should result in 6 but it is 7.
Using gawk for Windows with UTF-8 produces wrong results for at least the functions length, substr, index, match, split("Céline", CHARS, ""), printf, sprintf.

Creating a DOS Batch with setting the environment variable LC_ALL doesn't help:
celine.bat:
SET LC_ALL=en_US.UTF-8
gawk -f celine.awk
Content of celine.awk:
BEGIN {
 test = "Céline"
 print length(test)
 print substr(test,2,1)
 print "|" sprintf("%-12.12s", test) "|"
}

reply via email to

[Prev in Thread] Current Thread [Next in Thread]