[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] third argument to split() always behaves like FS, and the
From: |
arnold |
Subject: |
Re: [bug-gawk] third argument to split() always behaves like FS, and the docs are not clear about it |
Date: |
Tue, 27 Nov 2018 08:32:35 -0700 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Hi.
R <address@hidden> wrote:
> If the third argument to split() is made up of a single character, it
> will NOT be treated as a regular expression:
>
> [ ... ]
>
> This is (for me) completely expected, and it's the way it works in all
> awk implementations I know of.
>
> But neither the man nor the info page of gawk are clear about it --
> they both talk about the 3rd argument as of a 'regular expression'.
> The info page goes into some detail about the case where it's a single
> space, but does not say anything about the case where it's a special
> character like '.' or '('.
I have updated the documentation (patch below). This is already pushed
to git.
Thanks,
Arnold
------------------------------
diff --git a/doc/gawk.1 b/doc/gawk.1
index a2448695..7bcef9d6 100644
--- a/doc/gawk.1
+++ b/doc/gawk.1
@@ -13,7 +13,7 @@
. if \w'\(rq' .ds rq "\(rq
. \}
.\}
-.TH GAWK 1 "Apr 08 2018" "Free Software Foundation" "Utility Commands"
+.TH GAWK 1 "Nov 26 2018" "Free Software Foundation" "Utility Commands"
.SH NAME
gawk \- pattern scanning and processing language
.SH SYNOPSIS
@@ -3031,7 +3031,7 @@ is the possibly null separator that appeared after
The value of
.B seps[0]
is the possibly null leading separator.
-\&\fRIf
+If
.I r
is omitted,
.B FPAT
@@ -3071,7 +3071,7 @@ between
.BI a[ i ]
and
.BI a[ i +1]\fR.
-\&\fRIf
+If
.I r
is a single space, then leading whitespace in
.I s
@@ -3084,6 +3084,10 @@ where
is the return value of
.BI split( s ", " a ", " r ", " seps )\fR.
Splitting behaves identically to field splitting, described above.
+In particular, if
+.I r
+is a single-character string, that string acts as the separator,
+even if it happens to be a regular expression metacharacter.
.TP
.BI sprintf( fmt , " expr-list" )
Print
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index dddcf673..51d9afeb 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -7704,7 +7704,6 @@ FPAT = "([^,]*)|(\"[^\"]+\")"
Finally, the @code{patsplit()} function makes the same functionality
available for splitting regular strings (@pxref{String Functions}).
-
@node Testing field creation
@section Checking How @command{gawk} Is Splitting Records
@@ -17678,8 +17677,8 @@ whitespace goes into @address@hidden@var{n}]}, where
@var{n} is the
return value of
@code{split()} (i.e., the number of elements in @var{array}).
-The @code{split()} function splits strings into pieces in a
-manner similar to the way input lines are split into fields. For example:
+The @code{split()} function splits strings into pieces in the same way
+that input lines are split into fields. For example:
@example
split("cul-de-sac", a, "-", seps)
@@ -17715,6 +17714,8 @@ are separated by runs of whitespace.
Also, as with input field splitting, if @var{fieldsep} is the null string, each
individual character in the string is split into its own array element.
@value{COMMONEXT}
+Additionally, if @var{fieldsep} is a single-character string, that string acts
+as the separator, even if its value is a regular expression metacharacter.
Note, however, that @code{RS} has no effect on the way @code{split()}
works. Even though @samp{RS = ""} causes the newline character to also be an
input