[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] gawk stops reading input at SUB character
From: |
Andrew J. Schorr |
Subject: |
Re: [bug-gawk] gawk stops reading input at SUB character |
Date: |
Tue, 12 Sep 2017 12:58:56 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi,
On Tue, Sep 12, 2017 at 05:58:51PM +0300, Paavo Tamminen wrote:
> I have successfully used gawk with mixed text--and-binary content.
>
> However, I ran into problem as gawk stops reading the input file if there
> is a <SUB> character in the file. The character <SUB> is a control
> character 'substitute', x1A in hex.
>
> *input file (**test.txt:) has three lines with *
> *<SUB> at line two:*
> line 1 aA
> line 2 b<SUB>B
> line 3 cC
>
>
> On windows cmd-promt the following shows output only to the up to character
> b. So <SUB> seems to be treated as an end of file.
>
> *gawk.exe "{print $0}" test.txt*
> line 1 aA
> line 2 b
>
> *gawk.exe --version*
> GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.0-p8, GNU MP 5.0.2)
>
> My gawk (gawk-4.1.4-w32-bin.zip) is loaded from
> https://sourceforge.net/projects/ezwinports/
>
> <https://sourceforge.net/projects/ezwinports/>
I guess this is probably a Windows issue, since 0x1A Ctrl-Z typically means EOF
in DOS, if I recall correctly from the dark days of my youth. I tested on
Linux and on Cygwin, and it works correctly on both:
bash-4.2$ gawk -l ordchr 'BEGIN {printf "aA\nb%sB\ncC\n", chr(0x1a)}' > test.txt
bash-4.2$ od -c -tx1 test.txt
0000000 a A \n b 032 B \n c C \n
61 41 0a 62 1a 42 0a 63 43 0a
0000012
bash-4.2$ gawk '{print}' test.txt | od -c -tx1
0000000 a A \n b 032 B \n c C \n
61 41 0a 62 1a 42 0a 63 43 0a
0000012
I have attached the input file test.txt. Can you please confirm that this
is the input that shows the problem?
Regards,
Andy
test.txt
Description: Text document