bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] characters-as-bytes switch


From: SP
Subject: [bug-gawk] characters-as-bytes switch
Date: Sun, 17 Jun 2012 00:55:16 +0200

Hello,

Sorry for my approximate english, I'm french ;-)

Well, I've just installed the latest cygwin binaries under Windows 7, in
order to have a gawk with "characters-as-bytes" switch. Unfortunately, this
switch doesn't seem to act correctly within pattern. Here is a full log
demonstrating the problem. Note that \xE2\x80\x93 is a valid UTF-8
character, not \xE2\x80\x42, and note the period in the gensub pattern.

==========

C:\>ver
Microsoft Windows [Version 6.1.7601]

C:\>gawk.exe --version
GNU Awk 4.0.1
...
blah blah

C:\>gawk.exe 'BEGIN { print "\xE2\x80\x93"; exit }' | gawk.exe
--characters-as-bytes "{ print gensub(/\xE2\x80./,""ZZZ"",""g"",$0)}" | od
-c -t x1

0000000 342 200 223  \n
         e2  80  93  0a
0000004

C:\>gawk.exe 'BEGIN { print "\xE2\x80\x42"; exit }' | gawk.exe
--characters-as-bytes "{ print gensub(/\xE2\x80./,""ZZZ"",""g"",$0)}" | od
-c -t x1

0000000   Z   Z   Z  \n
         5a  5a  5a  0a
0000004

==========

If I inject a real UTF-8 char, /\xE2\x80./ doestn't match despite
--characters-as-bytes. And if I inject an invalid UTF-8 char /\xE2\x80./
matches.

Thanks by advance for your help in circumvention and/or correction of this
problem ! 

Stéphane






reply via email to

[Prev in Thread] Current Thread [Next in Thread]