[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] escaped pipe char in FS mistreated
From: |
Davide Brini |
Subject: |
Re: [bug-gawk] escaped pipe char in FS mistreated |
Date: |
Wed, 13 Mar 2013 16:01:36 +0100 |
On Wed, 13 Mar 2013 06:14:25 -0700, Nat Brown <address@hidden> wrote:
> issue: if my data is instead pipe-separated, such as
>
> 12345 | | | | | 12 | Data Street| Command Deck | Enterprise| Space|
> 17094
>
> using FS="|" works to split fields around the pipe character, but
> including the pipe in a regexp FS results in silent failure by AWK, non
> sensible warning "warning: escape sequence `\|' treated as plain `|'" and
> failure by GAWK:
>
> BEGIN { FS="[ \t]*\|[ \t]*"; }
> {
> for (i=1; i <= NF; i++) {
> printf "%2d '%s'\n", i, $i;
> }
> }
>
> yields:
>
> 1 '12345'
> 2 '|'
> 3 '|'
> 4 '|'
> 5 '|'
> 6 '|'
> 7 '12'
> 8 '|'
> 9 'Data'
> 10 'Street|'
> 11 'Command'
> 12 'Deck'
> 13 '|'
> 14 'Enterprise|'
> 15 'Space|'
> 16 '17094'
>
> expected behavior would be to treat '\|' as the character '|', identically
> to ',' or other characters, rather than stripping the escape and
> incorporating it into the FS regexp.
The warning you get is sensible, and tells you exactly what gawk is doing.
Gawk interpolates literal strings (that's why you can do
var = "abc\ndef\tfoo"), so after it does that the string which ends up
in FS (or any variable you'd assign that string to, for that matter) is
"[ \t]*|[ \t]*"
So to do what you're trying to do you have to use
FS="[ \t]*\\|[ \t]*"
After interpolation, this becomes
"[ \t]*\|[ \t]*"
which, when used as a regexp for FS, means what you want.
When you used FS="|", it worked because when FS is a single char it's
special-cased and treated as a literal character and not as a regular
expression.
More information:
http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps
--
D.