[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Specifying multiple separators via FS or the -F command line flag -
From: |
cga2000 |
Subject: |
Re: Specifying multiple separators via FS or the -F command line flag - addendum |
Date: |
Tue, 04 Dec 2007 20:04:29 -0500 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
On Mon, Dec 03, 2007 at 11:32:02PM EST, Bob Proulx wrote:
> cga2000 wrote:
> > Here's a sample of how the multiple separators feature behaves:
> >
> > [15:52:17][gavron@turki:~]$ echo " one: two:three :four five" | awk -F "[:
> > ]" '{print "1 "$1; print "2 "$2; print "3 "$3; print "4 "$4; print "5 "$5;
> > print "6 "$6;print "7 "$7;print "8 "$8}'
>
> Thanks for the small example. (I just read your last posting and will
> probably respond to it but this one was much easier.)
>
> > 1
> > 2 one
> > 3
> > 4 two
> > 5 three
> > 6
> > 7 four
> > 8 five
> >
> > Doesn't seem very logical to me.
Maybe I meant "intuitive" .. except that this overloaded term has become
such as private joke where I'm concerned that I tend to instinctively
avoid it .. :-)
Intuition means that in very common situations where you're parsing
text--and since the default FS is <space> .. it would seem rather
"natural" to default to a behavior where two or three or even four
spaces .. e.g. .. only count as one separator.
??
> Each field separator is splitting a field. So for example -F_ on
> "___" would delimit four fields. But before we do down this path I
> know what you want and we are going to do it differently to get there.
>
> > When awk successfully tests for space or colon, the following characters
> > are assumed NOT to be separators even if they have been defined as such
> > via the -F flag -- eg. the <space> that follows "one:" is mapped to the
> > $3 variable.
> >
> > Is this the way it's supposed to work?
>
> The way it is supposed to work is defined here:
>
> http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html
>
> Search for the section "Regular Expressions" where the the FS ERE is
> discussed.
>
> An extended regular expression can be used to separate fields by using
> the -F ERE option or by assigning a string containing the expression
> to the built-in variable FS. The default value of the FS variable
> shall be a single <space>. The following describes FS behavior:
>
> 1. If FS is a null string, the behavior is unspecified.
> 2. If FS is a single character:
> a. If FS is <space>, skip leading and trailing <blank>s;
> fields shall be delimited by sets of one or more <blank>s.
> b. Otherwise, if FS is any other character c, fields shall be
> delimited by each single occurrence of c.
> 3. Otherwise, the string value of FS shall be considered to be an
> extended regular expression. Each occurrence of a sequence
> matching the extended regular expression shall delimit fields.
>
> As you can see the default splitting behavior on a single space is
> done as a one-off special. The space is different than any other
> field separator.
Quite "logical".
I am not a programmer and have very little time to dedicate to the *nix
playground. So when I have to, I grab the first online tutorial that
makes sense and try to make the language work for me.
Otherwise with maybe 6-8 hours a week devoted to computing in general, I
would get nowhere.
> What you probably want is option 3 above where the field separator is
> an extended regular expression. Try this:
>
> echo " one: two:three :four five" | awk -F "[: ]+" '{print "1 "$1; print "2
> "$2; print "3 "$3; print "4 "$4; print "5 "$5; print "6 "$6;print "7
> "$7;print "8 "$8}'
> 1
> 2 one
> 3 two
> 4 three
> 5 four
> 6 five
> 7
> 8
> The -F"[: ]+" has a "+" now and will match one or more occurrences of
> either character.
I like that.
> But there is still a difference because leading field separators are
> not trimmed.
But this doesn't make sense ..
I mean .. "-F [: ]+" tells awk that " " eg. is a separator .. so
something like " : : " should be one big separator & should become
part of the implicit "beginning of line" separator, no ..??
As a result something like:
: :: f1 f2 f3
.. should have strings "f1" "f2" "f3" map to $1 $2 $3.
??
> There are a couple of ways of
> dealing with that but neither are particularly elegant.
>
> echo " one: two:three :four five" | awk -F "[: ]+" '{sub(FS,"",$0);print "1
> "$1; print "2 "$2; print "3 "$3; print "4 "$4; print "5 "$5; print "6
> "$6;print "7 "$7;print "8 "$8}'
> 1 one
> 2 two
> 3 three
> 4 four
> 5 five
> 6
> 7
> 8
>
> This does a substitution across the line for the FS variable. That is
> the same as sub(/[: ]+/,"",$0); here but using FS ties it to -F
> nicely. The $0 can be omitted in this but I like to be explicit.
> Hope this helps,
So little time .. too much stuff ..