[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[SCM] gawk branch, gawk-5.2-stable, updated. gawk-4.1.0-5002-gbd3a8ae0
From: |
Arnold Robbins |
Subject: |
[SCM] gawk branch, gawk-5.2-stable, updated. gawk-4.1.0-5002-gbd3a8ae0 |
Date: |
Sun, 26 Feb 2023 14:24:16 -0500 (EST) |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gawk".
The branch, gawk-5.2-stable has been updated
via bd3a8ae05c40b6e44a7be92bcaddd2dfbd9cdbaf (commit)
from 153aa457a9a06a7573fd42b650682309fa1b7f8a (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.sv.gnu.org/cgit/gawk.git/commit/?id=bd3a8ae05c40b6e44a7be92bcaddd2dfbd9cdbaf
commit bd3a8ae05c40b6e44a7be92bcaddd2dfbd9cdbaf
Author: Arnold D. Robbins <arnold@skeeve.com>
Date: Sun Feb 26 21:23:53 2023 +0200
Remove trailing whitespace from a bunch of files.
diff --git a/ChangeLog b/ChangeLog
index 488e7a49..5885061a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2023-02-26 Arnold D. Robbins <arnold@skeeve.com>
+
+ * Multiple files: Remove trailing whitespace.
+
2023-02-24 Arnold D. Robbins <arnold@skeeve.com>
* awk.h (boolval): Handle Node_var. Thanks to
diff --git a/awkgram.c b/awkgram.c
index 1830b597..48fd6357 100644
--- a/awkgram.c
+++ b/awkgram.c
@@ -2481,7 +2481,7 @@ yyreduce:
merge_comments(yyvsp[0], NULL);
ip = list_create(instruction(Op_no_op));
- yyval = list_append(ip, yyvsp[0]);
+ yyval = list_append(ip, yyvsp[0]);
} else
yyval = NULL;
}
diff --git a/awkgram.y b/awkgram.y
index 7bd19e86..77c10372 100644
--- a/awkgram.y
+++ b/awkgram.y
@@ -641,7 +641,7 @@ statement
merge_comments($2, NULL);
ip = list_create(instruction(Op_no_op));
- $$ = list_append(ip, $2);
+ $$ = list_append(ip, $2);
} else
$$ = NULL;
}
diff --git a/awklib/eg/lib/inplace.awk b/awklib/eg/lib/inplace.awk
index 0d40d16e..2c051cd6 100644
--- a/awklib/eg/lib/inplace.awk
+++ b/awklib/eg/lib/inplace.awk
@@ -1,20 +1,20 @@
# inplace --- load and invoke the inplace extension.
-#
+#
# Copyright (C) 2013, 2017, 2019 the Free Software Foundation, Inc.
-#
+#
# This file is part of GAWK, the GNU implementation of the
# AWK Programming Language.
-#
+#
# GAWK is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
-#
+#
# GAWK is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
-#
+#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
USA
diff --git a/awklib/eg/prog/id.awk b/awklib/eg/prog/id.awk
index 2ced41bd..157f6005 100644
--- a/awklib/eg/prog/id.awk
+++ b/awklib/eg/prog/id.awk
@@ -148,7 +148,7 @@ function print_first_field(str)
first = get_first_field(str)
printf("(%s)", first)
}
-function fill_info_for_user(user,
+function fill_info_for_user(user,
pwent, fields, groupnames, grent, groups, i)
{
pwent = getpwnam(user)
diff --git a/builtin.c b/builtin.c
index 4dd22e27..ddf713a7 100644
--- a/builtin.c
+++ b/builtin.c
@@ -2776,7 +2776,7 @@ do_match(int nargs)
dest = POP_PARAM();
if (dest->type != Node_var_array)
fatal(_("match: third argument is not an array"));
- check_symtab_functab(dest, "match",
+ check_symtab_functab(dest, "match",
_("%s: cannot use %s as third argument"));
assoc_clear(dest);
}
diff --git a/command.y b/command.y
index 18980d38..8a9194b7 100644
--- a/command.y
+++ b/command.y
@@ -22,7 +22,7 @@
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1335,
- * USA
+ * USA
*/
%{
diff --git a/doc/ChangeLog b/doc/ChangeLog
index cc2b35da..ed573c16 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2023-02-26 Arnold D. Robbins <arnold@skeeve.com>
+
+ * Multiple files: Remove trailing whitespace.
+
2023-02-25 Arnold D. Robbins <arnold@skeeve.com>
* gawktexi.in (Input Parsers): Clarify and improve some of the prose.
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 15c343f0..98afef70 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -50,7 +50,7 @@
@ifnottex
@set TIMES *
@end ifnottex
-
+
@c Let texinfo.tex give us full section titles
@xrefautomaticsectiontitle on
@@ -5370,7 +5370,7 @@ non-option argument, even if it begins with @samp{-}.
@itemize @value{MINUS}
@item
However, when an option itself requires an argument, and the option is
separated
-from that argument on the command line by at least one space, the space
+from that argument on the command line by at least one space, the space
is ignored, and the argument is considered to be related to the option. Thus,
in
the invocation, @samp{gawk -F x}, the @samp{x} is treated as belonging to the
@option{-F} option, not as a separate non-option argument.
@@ -6371,10 +6371,10 @@ Subject: Re: [bug-gawk] Does gawk character classes
follow this?
> From: arnold@skeeve.com
> Date: Fri, 15 Feb 2019 03:01:34 -0700
> Cc: pengyu.ut@gmail.com, bug-gawk@gnu.org
->
+>
> I get the feeling that there's something really bothering you, but
> I don't understand what.
->
+>
> Can you clarify, please?
I thought I already did: we cannot be expected to provide a definitive
@@ -9119,7 +9119,7 @@ processing on the next record @emph{right now}. For
example:
@{
while ((start = index($0, "/*")) != 0) @{
out = substr($0, 1, start - 1) # leading part of the string
- rest = substr($0, start + 2) # ... */ ...
+ rest = substr($0, start + 2) # ... */ ...
while ((end = index(rest, "*/")) == 0) @{ # is */ in trailing part?
# get more text
if (getline <= 0) @{
@@ -9745,7 +9745,7 @@ on a per-command or per-connection basis.
the attempt to read from the underlying device may
succeed in a later attempt. This is a limitation, and it also
means that you cannot use this to multiplex input from
-two or more sources. @xref{Retrying Input} for a way to enable
+two or more sources. @xref{Retrying Input} for a way to enable
later I/O attempts to succeed.
Assigning a timeout value prevents read operations from being
@@ -11737,7 +11737,7 @@ intact, as part of the string:
@example
$ @kbd{nawk 'BEGIN @{ print "hello, \}
> @kbd{world" @}'}
-@print{} hello,
+@print{} hello,
@print{} world
@end example
@@ -23765,7 +23765,7 @@ $ cat @kbd{test.awk}
@print{} rewound = 1
@print{} rewind()
@print{} @}
-@print{}
+@print{}
@print{} @{ print FILENAME, FNR, $0 @}
$ @kbd{gawk -f rewind.awk -f test.awk data }
@@ -26480,7 +26480,7 @@ exist:
@example
@c file eg/prog/id.awk
-function fill_info_for_user(user,
+function fill_info_for_user(user,
pwent, fields, groupnames, grent, groups, i)
@{
pwent = getpwnam(user)
@@ -30490,20 +30490,20 @@ using ptys can help deal with buffering deadlocks.
Suppose @command{gawk} were unable to add numbers.
You could use a coprocess to do it. Here's an exceedingly
-simple program written for that purpose:
+simple program written for that purpose:
@example
$ @kbd{cat add.c}
-#include <stdio.h>
-
-int
-main(void)
-@{
- int x, y;
- while (scanf("%d %d", & x, & y) == 2)
- printf("%d\n", x + y);
- return 0;
-@}
+#include <stdio.h>
+
+int
+main(void)
+@{
+ int x, y;
+ while (scanf("%d %d", & x, & y) == 2)
+ printf("%d\n", x + y);
+ return 0;
+@}
$ @kbd{cc -O add.c -o add} @ii{Compile the program}
@end example
@@ -30516,15 +30516,15 @@ $ @kbd{echo 1 2 |}
@end example
And it would deadlock, because @file{add.c} fails to call
-@samp{setlinebuf(stdout)}. The @command{add} program freezes.
+@samp{setlinebuf(stdout)}. The @command{add} program freezes.
-Now try instead:
+Now try instead:
@example
$ @kbd{echo 1 2 |}
> @kbd{gawk -v cmd=add 'BEGIN @{ PROCINFO[cmd, "pty"] = 1 @}}
> @kbd{ @{ print |& cmd; cmd |& getline x; print x @}'}
-@print{} 3
+@print{} 3
@end example
By using a pty, @command{gawk} fools the standard I/O library into
@@ -31115,7 +31115,7 @@ Terence Kelly, the author of the persistent memory
allocator
@command{gawk} uses, provides the following advice about the backing file:
@quotation
-Regarding backing file size, I recommend making it far larger
+Regarding backing file size, I recommend making it far larger
than all of the data that will ever reside in it, assuming
that the file system supports sparse files. The ``pay only
for what you use'' aspect of sparse files ensures that the
@@ -31203,8 +31203,8 @@ ACM @cite{Queue} magazine, Vol. 20 No. 2 (March/April
2022),
@uref{https://dl.acm.org/doi/pdf/10.1145/3534855, PDF},
@uref{https://queue.acm.org/detail.cfm?id=3534855, HTML}.
This paper explains the design of the PMA
-allocator used in persistent @command{gawk}.
-
+allocator used in persistent @command{gawk}.
+
@item @cite{Persistent Scripting}
Zi Fan Tan, Jianan Li, Haris Volos, and Terence Kelly,
Non-Volatile Memory Workshop (NVMW) 2022,
@@ -31216,7 +31216,7 @@ non-volatile memory; note that the interface differs
slightly.
@item @cite{Persistent Memory Programming on Conventional Hardware}
Terence Kelly,
ACM @cite{Queue} magazine Vol. 17 No. 4 (July/Aug 2019),
-@uref{https://dl.acm.org/doi/pdf/10.1145/3358955.3358957, PDF},
+@uref{https://dl.acm.org/doi/pdf/10.1145/3358955.3358957, PDF},
@uref{https://queue.acm.org/detail.cfm?id=3358957, HTML}.
This paper describes simple techniques for persistent memory for C/C++
code on conventional computers that lack non-volatile memory hardware.
@@ -31226,8 +31226,8 @@ Terence Kelly,
ACM @cite{Queue} magazine Vol. 18 No. 2 (March/April 2020),
@uref{https://dl.acm.org/doi/pdf/10.1145/3400899.3400902, PDF},
@uref{https://queue.acm.org/detail.cfm?id=3400902, HTML}.
-This paper describes a simple and robust testbed for testing software
-against real power failures.
+This paper describes a simple and robust testbed for testing software
+against real power failures.
@item @cite{Crashproofing the Original NoSQL Key/Value Store}
Terence Kelly,
@@ -35497,7 +35497,7 @@ It's Euler's modification to Newton's method for
calculating pi.
Take a look at lines (23) - (25) here:
http://mathworld.wolfram.com/PiFormulas.htm
-The algorithm I wrote simply expands the multiply by 2 and works from the
innermost expression outwards. I used this to program HP calculators because
it's quite easy to modify for tiny memory devices with smallish word sizes.
+The algorithm I wrote simply expands the multiply by 2 and works from the
innermost expression outwards. I used this to program HP calculators because
it's quite easy to modify for tiny memory devices with smallish word sizes.
http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/articles.cgi?read=899
@@ -37144,7 +37144,7 @@ Set this to @code{awk_true} if the field lengths are
specified in terms
of potentially multi-byte characters, and set it to @code{awk_false} if
the lengths are in terms of bytes.
Performance will be better if the values are supplied in
-terms of bytes.
+terms of bytes.
@item size_t nf;
Set this to the number of fields in the input record, i.e. @code{NF}.
@@ -37159,7 +37159,7 @@ for @code{$1}, and so on through the
@code{fields[nf-1]} element containing the
@end table
A convenience macro @code{awk_fieldwidth_info_size(numfields)} is provided to
-calculate the appropriate size of a variable-length
+calculate the appropriate size of a variable-length
@code{awk_fieldwidth_info_t} structure containing @code{numfields} fields.
This can
be used as an argument to @code{malloc()} or in a union to allocate space
statically. Please refer to the @code{readdir_test} sample extension for an
@@ -38526,7 +38526,7 @@ The following function allows extensions to access and
manipulate redirections.
Look up file @code{name} in @command{gawk}'s internal redirection table.
If @code{name} is @code{NULL} or @code{name_len} is zero, return
data for the currently open input file corresponding to @code{FILENAME}.
-(This does not access the @code{filetype} argument, so that may be undefined).
+(This does not access the @code{filetype} argument, so that may be undefined).
If the file is not already open, attempt to open it.
The @code{filetype} argument must be zero-terminated and should be one of:
@@ -39893,22 +39893,22 @@ all the variables and functions in the @code{inplace}
namespace
@c endfile
@ignore
@c file eg/lib/inplace.awk
-#
+#
# Copyright (C) 2013, 2017, 2019 the Free Software Foundation, Inc.
-#
+#
# This file is part of GAWK, the GNU implementation of the
# AWK Programming Language.
-#
+#
# GAWK is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
-#
+#
# GAWK is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
-#
+#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
USA
@@ -44480,7 +44480,7 @@ This is an @command{awk} interpreter written in the
@uref{https://golang.org/, Go programming language}.
It implements POSIX @command{awk}, with a few minor extensions.
Source code is available from @uref{https://github.com/benhoyt/goawk}.
-The author wrote a nice
+The author wrote a nice
@uref{https://benhoyt.com/writings/goawk/, article}
describing the implementation.
diff --git a/doc/gawkinet.info b/doc/gawkinet.info
index 1f22414e..7877e084 100644
--- a/doc/gawkinet.info
+++ b/doc/gawkinet.info
@@ -1,7 +1,7 @@
-This is gawkinet.info, produced by makeinfo version 6.8 from
+This is gawkinet.info, produced by makeinfo version 7.0.1 from
gawkinet.texi.
-This is Edition 1.6 of 'TCP/IP Internetworking with 'gawk'', for the
+This is Edition 1.6 of âTCP/IP Internetworking with âgawkââ, for the
5.2.0 (or later) version of the GNU implementation of AWK.
@@ -12,19 +12,19 @@ This is Edition 1.6 of 'TCP/IP Internetworking with
'gawk'', for the
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
-Invariant Sections being "GNU General Public License", the Front-Cover
+Invariant Sections being âGNU General Public Licenseâ, the Front-Cover
texts being (a) (see below), and with the Back-Cover Texts being (b)
(see below). A copy of the license is included in the section entitled
-"GNU Free Documentation License".
+âGNU Free Documentation Licenseâ.
- a. "A GNU Manual"
+ a. âA GNU Manualâ
- b. "You have the freedom to copy and modify this GNU manual. Buying
+ b. âYou have the freedom to copy and modify this GNU manual. Buying
copies from the FSF supports it in developing GNU and promoting
- software freedom."
+ software freedom.â
INFO-DIR-SECTION Network applications
START-INFO-DIR-ENTRY
-* awkinet: (gawkinet). TCP/IP Internetworking With 'gawk'.
+* awkinet: (gawkinet). TCP/IP Internetworking With âgawkâ.
END-INFO-DIR-ENTRY
@@ -33,10 +33,10 @@ File: gawkinet.info, Node: Top, Next: Preface, Prev:
(dir), Up: (dir)
General Introduction
********************
-This file documents the networking features in GNU Awk ('gawk') version
+This file documents the networking features in GNU Awk (âgawkâ) version
4.0 and later.
- This is Edition 1.6 of 'TCP/IP Internetworking with 'gawk'', for the
+ This is Edition 1.6 of âTCP/IP Internetworking with âgawkââ, for the
5.2.0 (or later) version of the GNU implementation of AWK.
@@ -47,16 +47,16 @@ This file documents the networking features in GNU Awk
('gawk') version
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
-Invariant Sections being "GNU General Public License", the Front-Cover
+Invariant Sections being âGNU General Public Licenseâ, the Front-Cover
texts being (a) (see below), and with the Back-Cover Texts being (b)
(see below). A copy of the license is included in the section entitled
-"GNU Free Documentation License".
+âGNU Free Documentation Licenseâ.
- a. "A GNU Manual"
+ a. âA GNU Manualâ
- b. "You have the freedom to copy and modify this GNU manual. Buying
+ b. âYou have the freedom to copy and modify this GNU manual. Buying
copies from the FSF supports it in developing GNU and promoting
- software freedom."
+ software freedom.â
* Menu:
@@ -75,7 +75,7 @@ texts being (a) (see below), and with the Back-Cover Texts
being (b)
* Basic Protocols:: The basic protocols.
* Ports:: The idea behind ports.
* Making Connections:: Making TCP/IP connections.
-* Gawk Special Files:: How to do 'gawk' networking.
+* Gawk Special Files:: How to do âgawkâ networking.
* Special File Fields:: The fields in the special file name.
* Comparing Protocols:: Differences between the protocols.
* File /inet/tcp:: The TCP special file.
@@ -110,28 +110,28 @@ Preface
*******
In May of 1997, Jürgen Kahrs felt the need for network access from
-'awk', and, with a little help from me, set about adding features to do
-this for 'gawk'. At that time, he wrote the bulk of this Info file.
+âawkâ, and, with a little help from me, set about adding features to do
+this for âgawkâ. At that time, he wrote the bulk of this Info file.
- The code and documentation were added to the 'gawk' 3.1 development
+ The code and documentation were added to the âgawkâ 3.1 development
tree, and languished somewhat until I could finally get down to some
-serious work on that version of 'gawk'. This finally happened in the
+serious work on that version of âgawkâ. This finally happened in the
middle of 2000.
Meantime, Jürgen wrote an article about the Internet special files
-and '|&' operator for 'Linux Journal', and made a networking patch for
-the production versions of 'gawk' available from his home page. In
-August of 2000 (for 'gawk' 3.0.6), this patch also made it to the main
-GNU 'ftp' distribution site.
+and â|&â operator for âLinux Journalâ, and made a networking patch for
+the production versions of âgawkâ available from his home page. In
+August of 2000 (for âgawkâ 3.0.6), this patch also made it to the main
+GNU âftpâ distribution site.
- For release with 'gawk', I edited Jürgen's prose for English grammar
+ For release with âgawkâ, I edited Jürgenâs prose for English grammar
and style, as he is not a native English speaker. I also rearranged the
material somewhat for what I felt was a better order of presentation,
and (re)wrote some of the introductory material.
The majority of this document and the code are his work, and the high
quality and interesting ideas speak for themselves. It is my hope that
-these features will be of significant value to the 'awk' community.
+these features will be of significant value to the âawkâ community.
Arnold Robbins
@@ -145,7 +145,7 @@ File: gawkinet.info, Node: Introduction, Next: Using
Networking, Prev: Prefac
*********************
This major node provides a (necessarily) brief introduction to computer
-networking concepts. For many applications of 'gawk' to TCP/IP
+networking concepts. For many applications of âgawkâ to TCP/IP
networking, we hope that this is enough. For more advanced tasks, you
will need deeper background, and it may be necessary to switch to
lower-level programming in C or C++.
@@ -180,7 +180,7 @@ When you make a phone call, the following steps occur:
network, refuses to answer the call.
4. Assuming the other party answers, the connection between you is now
- a "duplex" (two-way), "reliable" (no data lost), sequenced (data
+ a âduplexâ (two-way), âreliableâ (no data lost), sequenced (data
comes out in the order sent) data stream.
5. You and your friend may now talk freely, with the phone system
@@ -190,7 +190,7 @@ When you make a phone call, the following steps occur:
The same steps occur in a duplex reliable computer networking
connection. There is considerably more overhead in setting up the
-communications, but once it's done, data moves in both directions,
+communications, but once itâs done, data moves in both directions,
reliably, in sequence.
@@ -215,21 +215,21 @@ following.
5. One or more may get lost in the mail. (Although, fortunately, this
does not occur very often.)
- 6. In a computer network, one or more "packets" may also arrive
- multiple times. (This doesn't happen with the postal system!)
+ 6. In a computer network, one or more âpacketsâ may also arrive
+ multiple times. (This doesnât happen with the postal system!)
The important characteristics of datagram communications, like those
of the postal system are thus:
- * Delivery is "best effort;" the data may never get there.
+ ⢠Delivery is âbest effort;â the data may never get there.
- * Each message is self-contained, including the source and
+ ⢠Each message is self-contained, including the source and
destination addresses.
- * Delivery is _not_ sequenced; packets may arrive out of order,
+ ⢠Delivery is _not_ sequenced; packets may arrive out of order,
and/or multiple times.
- * Unlike the phone system, overhead is considerably lower. It is not
+ ⢠Unlike the phone system, overhead is considerably lower. It is not
necessary to set up the call first.
The price the user pays for the lower overhead of datagram
@@ -245,7 +245,7 @@ File: gawkinet.info, Node: The TCP/IP Protocols, Next:
Making Connections, Pr
The Internet Protocol Suite (usually referred to as just TCP/IP)(1)
consists of a number of different protocols at different levels or
-"layers." For our purposes, three protocols provide the fundamental
+âlayers.â For our purposes, three protocols provide the fundamental
communications mechanisms. All other defined protocols are referred to
as user-level protocols (e.g., HTTP, used later in this Info file).
@@ -269,8 +269,8 @@ File: gawkinet.info, Node: Basic Protocols, Next: Ports,
Prev: The TCP/IP Pro
IP
The Internet Protocol. This protocol is almost never used directly
by applications. It provides the basic packet delivery and routing
- infrastructure of the Internet. Much like the phone company's
- switching centers or the Post Office's trucks, it is not of much
+ infrastructure of the Internet. Much like the phone companyâs
+ switching centers or the Post Officeâs trucks, it is not of much
day-to-day interest to the regular user (or programmer). It
happens to be a best effort datagram protocol. In the early
twenty-first century, there are two versions of this protocol in
@@ -281,11 +281,11 @@ IP
addresses, on which most of the current Internet is based.
IPv6
- The "next generation" of the Internet Protocol, with 128-bit
+ The ânext generationâ of the Internet Protocol, with 128-bit
addresses. This protocol is in wide use in certain parts of
the world, but has not yet replaced IPv4.(1)
- Versions of the other protocols that sit "atop" IP exist for both
+ Versions of the other protocols that sit âatopâ IP exist for both
IPv4 and IPv6. However, as the IPv6 versions are fundamentally the
same as the original IPv4 versions, we will not distinguish further
between them.
@@ -293,14 +293,14 @@ IP
UDP
The User Datagram Protocol. This is a best effort datagram
protocol. It provides a small amount of extra reliability over IP,
- and adds the notion of "ports", described in *note TCP and UDP
+ and adds the notion of âportsâ, described in *note TCP and UDP
Ports: Ports.
TCP
The Transmission Control Protocol. This is a duplex, reliable,
sequenced byte-stream protocol, again layered on top of IP, and
also providing the notion of ports. This is the protocol that you
- will most likely use when using 'gawk' for network programming.
+ will most likely use when using âgawkâ for network programming.
All other user-level protocols use either TCP or UDP to do their
basic communications. Examples are SMTP (Simple Mail Transfer
@@ -309,7 +309,7 @@ Protocol).
---------- Footnotes ----------
- (1) There isn't an IPv5.
+ (1) There isnât an IPv5.
File: gawkinet.info, Node: Ports, Prev: Basic Protocols, Up: The TCP/IP
Protocols
@@ -323,20 +323,20 @@ than one person at the location; thus you have to further
quantify the
recipient by putting a person or company name on the envelope.
In the phone system, one phone number may represent an entire
-company, in which case you need a person's extension number in order to
+company, in which case you need a personâs extension number in order to
reach that individual directly. Or, when you call a home, you have to
-say, "May I please speak to ..." before talking to the person directly.
+say, âMay I please speak to ...â before talking to the person directly.
IP networking provides the concept of addressing. An IP address
represents a particular computer, but no more. In order to reach the
mail service on a system, or the FTP or WWW service on a system, you
must have some way to further specify which service you want. In the
-Internet Protocol suite, this is done with "port numbers", which
+Internet Protocol suite, this is done with âport numbersâ, which
represent the services, much like an extension number used with a phone
number.
Port numbers are 16-bit integers. Unix and Unix-like systems reserve
-ports below 1024 for "well known" services, such as SMTP, FTP, and HTTP.
+ports below 1024 for âwell knownâ services, such as SMTP, FTP, and HTTP.
Numbers 1024 and above may be used by any application, although there is
no promise made that a particular port number is always available.
@@ -346,22 +346,22 @@ File: gawkinet.info, Node: Making Connections, Prev:
The TCP/IP Protocols, Up
1.4 Making TCP/IP Connections (And Some Terminology)
====================================================
-Two terms come up repeatedly when discussing networking: "client" and
-"server". For now, we'll discuss these terms at the "connection level",
+Two terms come up repeatedly when discussing networking: âclientâ and
+âserverâ. For now, weâll discuss these terms at the âconnection
levelâ,
when first establishing connections between two processes on different
systems over a network. (Once the connection is established, the higher
-level, or "application level" protocols, such as HTTP or FTP, determine
+level, or âapplication levelâ protocols, such as HTTP or FTP, determine
who is the client and who is the server. Often, it turns out that the
client and server are the same in both roles.)
- The "server" is the system providing the service, such as the web
-server or email server. It is the "host" (system) which is _connected
+ The âserverâ is the system providing the service, such as the web
+server or email server. It is the âhostâ (system) which is _connected
to_ in a transaction. For this to work though, the server must be
expecting connections. Much as there has to be someone at the office
building to answer the phone,(1) the server process (usually) has to be
started first and be waiting for a connection.
- The "client" is the system requesting the service. It is the system
+ The âclientâ is the system requesting the service. It is the system
_initiating the connection_ in a transaction. (Just as when you pick up
the phone to call an office or store.)
@@ -373,19 +373,19 @@ can a new one be built up on the same port. This is
contrary to the
usual behavior of fully developed web servers which have to avoid
situations in which they are not reachable. We have to pay this price
in order to enjoy the benefits of a simple communication paradigm in
-'gawk'.)
+âgawkâ.)
Furthermore, once the connection is established, communications are
-"synchronous".(2) I.e., each end waits on the other to finish
+âsynchronousâ.(2) I.e., each end waits on the other to finish
transmitting, before replying. This is much like two people in a phone
conversation. While both could talk simultaneously, doing so usually
-doesn't work too well.
+doesnât work too well.
In the case of TCP, the synchronicity is enforced by the protocol
-when sending data. Data writes "block" until the data have been
+when sending data. Data writes âblockâ until the data have been
received on the other end. For both TCP and UDP, data reads block until
there is incoming data waiting to be read. This is summarized in the
-following table, where an "x" indicates that the given action blocks.
+following table, where an âxâ indicates that the given action blocks.
TCP x x
UDP x
@@ -394,33 +394,33 @@ UDP x
(1) In the days before voice mail systems!
- (2) For the technically savvy, data reads block--if there's no
+ (2) For the technically savvy, data reads blockâif thereâs no
incoming data, the program is made to wait until there is, instead of
-receiving a "there's no data" error return.
+receiving a âthereâs no dataâ error return.
File: gawkinet.info, Node: Using Networking, Next: Some Applications and
Techniques, Prev: Introduction, Up: Top
-2 Networking With 'gawk'
+2 Networking With âgawkâ
************************
-The 'awk' programming language was originally developed as a
+The âawkâ programming language was originally developed as a
pattern-matching language for writing short programs to perform data
-manipulation tasks. 'awk''s strength is the manipulation of textual
+manipulation tasks. âawkââs strength is the manipulation of textual
data that is stored in files. It was never meant to be used for
networking purposes. To exploit its features in a networking context,
-it's necessary to use an access mode for network connections that
+itâs necessary to use an access mode for network connections that
resembles the access of files as closely as possible.
- 'awk' is also meant to be a prototyping language. It is used to
+ âawkâ is also meant to be a prototyping language. It is used to
demonstrate feasibility and to play with features and user interfaces.
-This can be done with file-like handling of network connections. 'gawk'
+This can be done with file-like handling of network connections. âgawkâ
trades the lack of many of the advanced features of the TCP/IP family of
protocols for the convenience of simple connection handling. The
advanced features are available when programming in C or Perl. In fact,
the network programming in this major node is very similar to what is
-described in books such as 'Internet Programming with Python', 'Advanced
-Perl Programming', or 'Web Client Programming with Perl'.
+described in books such as âInternet Programming with Pythonâ, âAdvanced
+Perl Programmingâ, or âWeb Client Programming with Perlâ.
However, you can do the programming here without first having to
learn object-oriented ideology; underlying languages such as Tcl/Tk,
@@ -432,7 +432,7 @@ protocol is much less important for most users.
* Menu:
-* Gawk Special Files:: How to do 'gawk' networking.
+* Gawk Special Files:: How to do âgawkâ networking.
* TCP Connecting:: Making a TCP connection.
* Troubleshooting:: Troubleshooting TCP/IP connections.
* Interacting:: Interacting with a service.
@@ -448,30 +448,30 @@ protocol is much less important for most users.
File: gawkinet.info, Node: Gawk Special Files, Next: TCP Connecting, Prev:
Using Networking, Up: Using Networking
-2.1 'gawk''s Networking Mechanisms
+2.1 âgawkââs Networking Mechanisms
==================================
-The '|&' operator for use in communicating with a "coprocess" is
+The â|&â operator for use in communicating with a âcoprocessâ is
described in *note Two-way Communications With Another Process:
(gawk)Two-way I/O. It shows how to do two-way I/O to a separate process,
-sending it data with 'print' or 'printf' and reading data with
-'getline'. If you haven't read it already, you should detour there to
+sending it data with âprintâ or âprintfâ and reading data with
+âgetlineâ. If you havenât read it already, you should detour there to
do so.
- 'gawk' transparently extends the two-way I/O mechanism to simple
-networking through the use of special file names. When a "coprocess"
+ âgawkâ transparently extends the two-way I/O mechanism to simple
+networking through the use of special file names. When a âcoprocessâ
that matches the special files we are about to describe is started,
-'gawk' creates the appropriate network connection, and then two-way I/O
+âgawkâ creates the appropriate network connection, and then two-way I/O
proceeds as usual.
At the C, C++, and Perl level, networking is accomplished via
-"sockets", an Application Programming Interface (API) originally
+âsocketsâ, an Application Programming Interface (API) originally
developed at the University of California at Berkeley that is now used
almost universally for TCP/IP networking. Socket level programming,
while fairly straightforward, requires paying attention to a number of
details, as well as using binary data. It is not well-suited for use
-from a high-level language like 'awk'. The special files provided in
-'gawk' hide the details from the programmer, making things much simpler
+from a high-level language like âawkâ. The special files provided in
+âgawkâ hide the details from the programmer, making things much simpler
and easier to use.
The special file name for network access is made up of several
@@ -495,39 +495,39 @@ File: gawkinet.info, Node: Special File Fields, Next:
Comparing Protocols, Pr
This node explains the meaning of all of the fields, as well as the
range of values and the defaults. All of the fields are mandatory. To
-let the system pick a value, or if the field doesn't apply to the
-protocol, specify it as '0' (zero):
+let the system pick a value, or if the field doesnât apply to the
+protocol, specify it as â0â (zero):
NET-TYPE
- This is one of 'inet4' for IPv4, 'inet6' for IPv6, or 'inet' to use
+ This is one of âinet4â for IPv4, âinet6â for IPv6, or âinetâ
to use
the system default (which is likely to be IPv4). For the rest of
- this document, we will use the generic '/inet' in our descriptions
- of how 'gawk''s networking works.
+ this document, we will use the generic â/inetâ in our descriptions
+ of how âgawkââs networking works.
PROTOCOL
Determines which member of the TCP/IP family of protocols is
selected to transport the data across the network. There are two
- possible values (always written in lowercase): 'tcp' and 'udp'.
+ possible values (always written in lowercase): âtcpâ and âudpâ.
The exact meaning of each is explained later in this node.
LOCALPORT
Determines which port on the local machine is used to communicate
- across the network. Application-level clients usually use '0' to
- indicate they do not care which local port is used--instead they
+ across the network. Application-level clients usually use â0â to
+ indicate they do not care which local port is usedâinstead they
specify a remote port to connect to.
It is vital for application-level servers to use a number different
- from '0' here because their service has to be available at a
+ from â0â here because their service has to be available at a
specific publicly known port number. It is possible to use a name
- from '/etc/services' here.
+ from â/etc/servicesâ here.
HOSTNAME
Determines which remote host is to be at the other end of the
connection. Application-level clients must enter a name different
- from '0'. The name can be either symbolic (e.g.,
- 'jpl-devvax.jpl.nasa.gov') or numeric (e.g., '128.149.1.143').
+ from â0â. The name can be either symbolic (e.g.,
+ âjpl-devvax.jpl.nasa.govâ) or numeric (e.g., â128.149.1.143â).
- Application-level servers must fill this field with a '0' to
+ Application-level servers must fill this field with a â0â to
indicate their being open for all other hosts to connect to them
and enforce connection level server behavior this way. It is not
possible for an application-level server to restrict its
@@ -535,19 +535,19 @@ HOSTNAME
REMOTEPORT
Determines which port on the remote machine is used to communicate
- across the network. For '/inet/tcp' and '/inet/udp',
- application-level clients _must_ use a number other than '0' to
+ across the network. For â/inet/tcpâ and â/inet/udpâ,
+ application-level clients _must_ use a number other than â0â to
indicate to which port on the remote machine they want to connect.
- Application-level servers must not fill this field with a '0'.
+ Application-level servers must not fill this field with a â0â.
Instead they specify a local port to which clients connect. It is
- possible to use a name from '/etc/services' here.
+ possible to use a name from â/etc/servicesâ here.
Experts in network programming will notice that the usual
client/server asymmetry found at the level of the socket API is not
visible here. This is for the sake of simplicity of the high-level
concept. If this asymmetry is necessary for your application, use
-another language. For 'gawk', it is more important to enable users to
+another language. For âgawkâ, it is more important to enable users to
write a client program with a minimum of code. What happens when first
accessing a network connection is seen in the following pseudocode:
@@ -567,7 +567,7 @@ accessing a network connection is seen in the following
pseudocode:
fields of the special file name. When in doubt, *note Table 2.1:
table-inet-components. gives you the combinations of values and their
meaning. If this table is too complicated, focus on the three lines
-printed in *bold*. All the examples in *note Networking With 'gawk':
+printed in *bold*. All the examples in *note Networking With âgawkâ:
Using Networking, use only the patterns printed in bold letters.
@@ -590,7 +590,7 @@ tcp, udp x 0 x Invalid
tcp, udp 0 0 0 Invalid
tcp, udp 0 x 0 Invalid
-Table 2.1: '/inet' Special File Components
+Table 2.1: â/inetâ Special File Components
In general, TCP is the preferred mechanism to use. It is the
simplest protocol to understand and to use. Use UDP only if
@@ -615,7 +615,7 @@ available and demonstrate the differences between them.
File: gawkinet.info, Node: File /inet/tcp, Next: File /inet/udp, Prev:
Comparing Protocols, Up: Comparing Protocols
-2.1.2.1 '/inet/tcp'
+2.1.2.1 â/inet/tcpâ
...................
Once again, always use TCP. (Use UDP when low overhead is a necessity.)
@@ -646,7 +646,7 @@ started first, and it waits for the receiver to read a line.
File: gawkinet.info, Node: File /inet/udp, Prev: File /inet/tcp, Up:
Comparing Protocols
-2.1.2.2 '/inet/udp'
+2.1.2.2 â/inet/udpâ
...................
The server and client programs that use UDP are almost identical to
@@ -671,13 +671,13 @@ started first:
close("/inet/udp/0/localhost/8888")
}
- In the case of UDP, the initial 'print' command is the one that
-actually sends data so that there is a connection. UDP and "connection"
+ In the case of UDP, the initial âprintâ command is the one that
+actually sends data so that there is a connection. UDP and âconnectionâ
sounds strange to anyone who has learned that UDP is a connectionless
-protocol. Here, "connection" means that the 'connect()' system call has
-completed its work and completed the "association" between a certain
+protocol. Here, âconnectionâ means that the âconnect()â system call
has
+completed its work and completed the âassociationâ between a certain
socket and an IP address. Thus there are subtle differences between
-'connect()' for TCP and UDP; see the man page for details.(1)
+âconnect()â for TCP and UDP; see the man page for details.(1)
UDP cannot guarantee that the datagrams at the receiving end will
arrive in exactly the same order they were sent. Some datagrams could
@@ -689,7 +689,7 @@ stateless services like the original versions of NFS.
---------- Footnotes ----------
(1) This subtlety is just one of many details that are hidden in the
-socket API, invisible and intractable for the 'gawk' user. The
+socket API, invisible and intractable for the âgawkâ user. The
developers are currently considering how to rework the network
facilities to make them easier to understand and use.
@@ -699,9 +699,9 @@ File: gawkinet.info, Node: TCP Connecting, Next:
Troubleshooting, Prev: Gawk
2.2 Establishing a TCP Connection
=================================
-Let's observe a network connection at work. Type in the following
+Letâs observe a network connection at work. Type in the following
program and watch the output. Within a second, it connects via TCP
-('/inet/tcp') to a remote server and asks the service 'daytime' on the
+(â/inet/tcpâ) to a remote server and asks the service âdaytimeâ on the
machine what time it is:
BEGIN {
@@ -714,43 +714,43 @@ machine what time it is:
close(daytime_connection)
}
- Even experienced 'awk' users will find the fourth and sixth line
+ Even experienced âawkâ users will find the fourth and sixth line
strange in two respects:
- * A string containing the name of a special file is used as a shell
- command that pipes its output into 'getline'. One would rather
+ ⢠A string containing the name of a special file is used as a shell
+ command that pipes its output into âgetlineâ. One would rather
expect to see the special file being read like any other file
- ('getline < "/inet/tcp/0/time-a-g.nist.gov/daytime"').
+ (âgetline < "/inet/tcp/0/time-a-g.nist.gov/daytime"â).
- * The operator '|&' has not been part of any 'awk' implementation
- (until now). It is actually the only extension of the 'awk'
+ ⢠The operator â|&â has not been part of any âawkâ implementation
+ (until now). It is actually the only extension of the âawkâ
language needed (apart from the special files) to introduce network
access.
- The '|&' operator was introduced in 'gawk' 3.1 in order to overcome
-the crucial restriction that access to files and pipes in 'awk' is
+ The â|&â operator was introduced in âgawkâ 3.1 in order to overcome
+the crucial restriction that access to files and pipes in âawkâ is
always unidirectional. It was formerly impossible to use both access
modes on the same file or pipe. Instead of changing the whole concept
-of file access, the '|&' operator behaves exactly like the usual pipe
+of file access, the â|&â operator behaves exactly like the usual pipe
operator except for two additions:
- * Normal shell commands connected to their 'gawk' program with a '|&'
- pipe can be accessed bidirectionally. The '|&' turns out to be a
- quite general, useful, and natural extension of 'awk'.
+ ⢠Normal shell commands connected to their âgawkâ program with a
â|&â
+ pipe can be accessed bidirectionally. The â|&â turns out to be a
+ quite general, useful, and natural extension of âawkâ.
- * Pipes that consist of a special file name for network connections
+ ⢠Pipes that consist of a special file name for network connections
are not executed as shell commands. Instead, they can be read and
written to, just like a full-duplex network connection.
- In the earlier example, the '|&' operator tells 'getline' to read a
-line from the special file '/inet/tcp/0/time-a-g.nist.gov/daytime'. We
+ In the earlier example, the â|&â operator tells âgetlineâ to read a
+line from the special file â/inet/tcp/0/time-a-g.nist.gov/daytimeâ. We
could also have printed a line into the special file. But instead we
just consumed an empty leading line, printed it, then read a line with
the time, printed that, and closed the connection. (While we could just
-let 'gawk' close the connection by finishing the program, in this Info
+let âgawkâ close the connection by finishing the program, in this Info
file we are pedantic and always explicitly close the connections.)
- Network services like 'daytime' are not really useful because there
+ Network services like âdaytimeâ are not really useful because there
are so many better ways to print the current time. In the early days of
TCP networking, such a service may have looked like a good idea for
testing purposes. Later, simple TCP services like these have been used
@@ -760,7 +760,7 @@ services. The list of servers
(https://tf.nist.gov/tf-cgi/servers.cgi)
that still support the legacy service daytime
(https://en.wikipedia.org/wiki/Daytime_Protocol) can be found at
Wikipedia. We hesitated to use this service in this manual because it
-is hard to find servers that still support services like 'daytime'
+is hard to find servers that still support services like âdaytimeâ
openly to the Internet. Later on we will see that some of these
nostalgic protocols have turned into security risks.
@@ -778,14 +778,14 @@ network programming.
For the rest of this major node, we will assume you work on a
POSIX-style system that supports TCP/IP. If the previous example program
does not run on your machine, it may help to replace the value assigned
-to the variable 'daytime_server' with the name (or the IP address) of
+to the variable âdaytime_serverâ with the name (or the IP address) of
another server from the list mentioned above. Now you should see the
date and time being printed by the program, otherwise you may have run
-out of servers that support the 'daytime' service.
+out of servers that support the âdaytimeâ service.
- Try changing the service to 'chargen' or 'ftp'. This way, the
+ Try changing the service to âchargenâ or âftpâ. This way, the
program connects to other services that should give you some response.
-If you are curious, you should have a look at your '/etc/services' file.
+If you are curious, you should have a look at your â/etc/servicesâ file.
It could look like this:
# /etc/services:
@@ -821,27 +821,27 @@ It could look like this:
usually support. If your GNU/Linux machine does not do so, it may be
that these services are switched off in some startup script. Systems
running some flavor of Microsoft Windows usually do _not_ support these
-services. Nevertheless, it _is_ possible to do networking with 'gawk'
+services. Nevertheless, it _is_ possible to do networking with âgawkâ
on Microsoft Windows.(1) The first column of the file gives the name of
the service, and the second column gives a unique number and the
protocol that one can use to connect to this service. The rest of the
-line is treated as a comment. You see that some services ('echo')
+line is treated as a comment. You see that some services (âechoâ)
support TCP as well as UDP.
---------- Footnotes ----------
(1) Microsoft preferred to ignore the TCP/IP family of protocols
until 1995. Then came the rise of the Netscape browser as a landmark
-"killer application." Microsoft added TCP/IP support and their own
+âkiller application.â Microsoft added TCP/IP support and their own
browser to Microsoft Windows 95 at the last minute. They even
back-ported their TCP/IP implementation to Microsoft Windows for
Workgroups 3.11, but it was a rather rudimentary and half-hearted
-implementation. Nevertheless, the equivalent of '/etc/services' resides
-under 'C:\WINNT\system32\drivers\etc\services' on Microsoft Windows 2000
+implementation. Nevertheless, the equivalent of â/etc/servicesâ resides
+under âC:\WINNT\system32\drivers\etc\servicesâ on Microsoft Windows 2000
and Microsoft Windows XP. On Microsoft Windows 7, 8 and 10 there is a
-directory '%WinDir%\System32\Drivers\Etc' that holds the 'hosts' file
+directory â%WinDir%\System32\Drivers\Etcâ that holds the âhostsâ file
(https://support.microsoft.com/en-us/help/972034/how-to-reset-the-hosts-file-back-to-the-default)
-and probably also a 'services' file
+and probably also a âservicesâ file
(https://www.ibm.com/support/knowledgecenter/SSRNYG_7.2.1/com.ibm.rational.synergy.install.win.doc/topics/sg_r_igw_services_file.html).
@@ -852,8 +852,8 @@ File: gawkinet.info, Node: Interacting, Next: Setting Up,
Prev: Troubleshooti
The next program begins really interacting with a network service by
printing something into the special file. It asks the so-called
-'finger' service if a user of the machine is logged in. When testing
-this program, try to change the variable 'finger_server' to some other
+âfingerâ service if a user of the machine is logged in. When testing
+this program, try to change the variable âfinger_serverâ to some other
machine name in your local network:
BEGIN {
@@ -869,15 +869,15 @@ machine name in your local network:
program repeatedly reads lines that come as a reply. When no more lines
are available (because the service has closed the connection), the
program also closes the connection. If you tried to replace
-'finger_server' with some other server name, the script probably
+âfinger_serverâ with some other server name, the script probably
reported being unable to open the connection, because most servers today
no longer support this service. Try replacing the login name of
-Professor Nace ('wnace') with another login name (like 'help'). You
+Professor Nace (âwnaceâ) with another login name (like âhelpâ). You
will receive a list of login names similar to the one you asked for. In
the 1980s you could get a list of all users currently logged in by
-asking for an empty string ('""').
+asking for an empty string (â""â).
- The final 'close()' call could be safely deleted from the above
+ The final âclose()â call could be safely deleted from the above
script, because the operating system closes any open connection by
default when a script reaches the end of execution. But, in order to
avoid portability problems, it is best to always close connections
@@ -885,9 +885,9 @@ explicitly. With the Linux kernel, for example, proper
closing results
in flushing of buffers. Letting the close happen by default may result
in discarding buffers.
- When looking at '/etc/services' you may have noticed that the
-'daytime' service is also available with 'udp'. In the earlier
-examples, change 'tcp' to 'udp' and try if the 'finger' and 'daytime'
+ When looking at â/etc/servicesâ you may have noticed that the
+âdaytimeâ service is also available with âudpâ. In the earlier
+examples, change âtcpâ to âudpâ and try if the âfingerâ and
âdaytimeâ
clients still work as expected. They probably will not respond because
a wise administrator switched off these services. But if they do, you
may see the expected day and time message. The program then hangs,
@@ -897,8 +897,8 @@ and UDP. When using UDP, neither party is automatically
informed about
the other closing the connection. Continuing to experiment this way
reveals many other subtle differences between TCP and UDP. To avoid such
trouble, you should always remember the advice Douglas E. Comer and
-David Stevens give in Volume III of their series 'Internetworking With
-TCP' (page 14):
+David Stevens give in Volume III of their series âInternetworking With
+TCPâ (page 14):
When designing client-server applications, beginners are strongly
advised to use TCP because it provides reliable,
@@ -910,19 +910,19 @@ TCP' (page 14):
This advice is actually quite dated and we hesitated to repeat it
here. But we left it in because we are still observing beginners
running into this pitfall. While this advice has aged quite well, some
-other ideas from the 1980s have not. The 'finger' service may still be
+other ideas from the 1980s have not. The âfingerâ service may still be
available in Microsoft Windows Server 2019
(https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/finger),
but it turned out to be a never-ending cause of trouble. First of all,
it is now obvious that a server should never reveal personal data about
its users to anonymous client software that connects over the wild wild
-Internet. So every server on the Internet should reject 'finger'
+Internet. So every server on the Internet should reject âfingerâ
requests (by disabling the port and by disabling the software serving
this port). But things got even worse in 2020 when it turned out that
-even the client software (the 'finger' command documented in the link
+even the client software (the âfingerâ command documented in the link
above) is a security problem. A tool called DarkFinger
(https://seclists.org/fulldisclosure/2020/Sep/30) allows to leverage the
-Microsoft Windows 'finger.exe' as a file downloader and help evade
+Microsoft Windows âfinger.exeâ as a file downloader and help evade
network security devices.
@@ -933,16 +933,16 @@ File: gawkinet.info, Node: Setting Up, Next: Email,
Prev: Interacting, Up: U
The preceding programs behaved as clients that connect to a server
somewhere on the Internet and request a particular service. Now we set
-up such a service to mimic the behavior of the 'daytime' service. Such
+up such a service to mimic the behavior of the âdaytimeâ service. Such
a server does not know in advance who is going to connect to it over the
network. Therefore, we cannot insert a name for the host to connect to
in our special file name.
Start the following program in one window. Notice that the service
-does not have the name 'daytime', but the number '8888'. From looking
-at '/etc/services', you know that names like 'daytime' are just
+does not have the name âdaytimeâ, but the number â8888â. From looking
+at â/etc/servicesâ, you know that names like âdaytimeâ are just
mnemonics for predetermined 16-bit integers. Only the system
-administrator ('root') could enter our new service into '/etc/services'
+administrator (ârootâ) could enter our new service into â/etc/servicesâ
with an appropriate name. Also notice that the service name has to be
entered into a different field of the special file name because we are
setting up a server, not a client:
@@ -955,34 +955,34 @@ setting up a server, not a client:
Now open another window on the same machine. Copy the client program
given as the first example (*note Establishing a TCP Connection: TCP
Connecting.) to a new file and edit it, changing the variable
-'daytime_server' to 'localhost' and the port name 'daytime' to '8888'.
+âdaytime_serverâ to âlocalhostâ and the port name âdaytimeâ to
â8888â.
Then start the modified client. You should get a reply like this:
$ gawk -f awklib/eg/network/daytimeclient.awk
- -| Sun Dec 27 17:33:57 CET 2020
- -| Sun Dec 27 17:33:57 CET 2020
+ ⣠Sun Dec 27 17:33:57 CET 2020
+ ⣠Sun Dec 27 17:33:57 CET 2020
Both programs explicitly close the connection.
Now we will intentionally make a mistake to see what happens when the
-name '8888' (the port) is already used by another service. Start the
+name â8888â (the port) is already used by another service. Start the
server program in both windows. The first one works, but the second one
complains that it could not open the connection. Each port on a single
machine can only be used by one server program at a time. Now terminate
-the server program and change the name '8888' to 'echo'. After
+the server program and change the name â8888â to âechoâ. After
restarting it, the server program does not run any more, and you know
-why: there is already an 'echo' service running on your machine. But
-even if this isn't true, you would not get your own 'echo' server
+why: there is already an âechoâ service running on your machine. But
+even if this isnât true, you would not get your own âechoâ server
running on a Unix machine, because the ports with numbers smaller than
-1024 ('echo' is at port 7) are reserved for 'root'. On machines running
+1024 (âechoâ is at port 7) are reserved for ârootâ. On machines
running
some flavor of Microsoft Windows, there is no restriction that reserves
-ports 1 to 1024 for a privileged user; hence, you can start an 'echo'
+ports 1 to 1024 for a privileged user; hence, you can start an âechoâ
server there. Even in later version of Microsoft Windows, this
-restriction of the Unix world seems to have never been adopted 'Does
-windows(10/server-2016) have privileged ports?'
+restriction of the Unix world seems to have never been adopted âDoes
+windows(10/server-2016) have privileged ports?â
(https://social.technet.microsoft.com/Forums/windowsserver/en-US/334f0770-eda9-475a-a27f-46b80ab7e872/does-windows10server2016-have-privileged-ports-?forum=ws2016).
In Microsoft Windows it is the level of the firewall that handles port
-access restrictions, not the level of the operating system's kernel.
+access restrictions, not the level of the operating systemâs kernel.
Turning this short server program into something really useful is
simple. Imagine a server that first reads a file name from the client
@@ -1004,8 +1004,8 @@ contents of the named file across the net. The
server-side processing
could also be the execution of a command that is transmitted across the
network. From this example, you can see how simple it is to open up a
security hole on your machine. If you allow clients to connect to your
-machine and execute arbitrary commands, anyone would be free to do 'rm
--rf *'.
+machine and execute arbitrary commands, anyone would be free to do ârm
+-rf *â.
The client side connects to port number 8888 on the server side and
sends the name of the desired file to be sent across the same TCP
@@ -1058,16 +1058,16 @@ the first email the server has in store:
close(POPService)
}
- We redefine the record separators 'RS' and 'ORS' because the protocol
+ We redefine the record separators âRSâ and âORSâ because the
protocol
(POP) requires CR-LF to separate lines. After identifying yourself to
-the email service, the command 'retr 1' instructs the service to send
+the email service, the command âretr 1â instructs the service to send
the first of all your email messages in line. If the service replies
-with something other than '+OK', the program exits; maybe there is no
+with something other than â+OKâ, the program exits; maybe there is no
email. Otherwise, the program first announces that it intends to finish
-reading email, and then redefines 'RS' in order to read the entire email
+reading email, and then redefines âRSâ in order to read the entire email
as multiline input in one record. From the POP RFC, we know that the
body of the email always ends with a single line containing a single
-dot. The program looks for this using 'RS = "\r\n\\.\r\n"'. When it
+dot. The program looks for this using âRS = "\r\n\\.\r\n"â. When it
finds this sequence in the mail message, it quits. You can invoke this
program as often as you like; it does not delete the message it reads,
but instead leaves it on the server.
@@ -1078,14 +1078,14 @@ but instead leaves it on the server.
simple when email was young in the 20th century. These days,
unencrypted plaintext authentication is usually disallowed on non-secure
connections. Since encryption of network connections is not supported
-in 'gawk', you should not use 'gawk' to write such scripts. We left
+in âgawkâ, you should not use âgawkâ to write such scripts. We left
this node as it is because it demonstrates how application level
protocols work in principle (a command being issued by the client
followed by a reply coming back). Unfortunately, modern application
level protocols are much more flexible in the sequence of actions. For
example, modern POP3 servers may introduce themselves with an unprompted
initial line that arrives before the initial command. Dealing with such
-variance is not worth the effort in 'gawk'.
+variance is not worth the effort in âgawkâ.
File: gawkinet.info, Node: Web page, Next: Primitive Service, Prev: Email,
Up: Using Networking
@@ -1105,7 +1105,7 @@ retrieving a web page. It uses the prehistoric syntax of
HTTP 0.9,
which almost all web servers still support. The most noticeable thing
about it is that the program directs the request to the local proxy
server whose name you insert in the special file name (which in turn
-calls 'www.yahoo.com'):
+calls âwww.yahoo.comâ):
BEGIN {
RS = ORS = "\r\n"
@@ -1116,14 +1116,14 @@ calls 'www.yahoo.com'):
close(HttpService)
}
- Again, lines are separated by a redefined 'RS' and 'ORS'. The 'GET'
+ Again, lines are separated by a redefined âRSâ and âORSâ. The
âGETâ
request that we send to the server is the only kind of HTTP request that
existed when the web was created in the early 1990s. HTTP calls this
-'GET' request a "method," which tells the service to transmit a web page
+âGETâ request a âmethod,â which tells the service to transmit a web
page
(here the home page of the Yahoo! search engine). Version 1.0 added
-the request methods 'HEAD' and 'POST'. The current version of HTTP is
-1.1,(1)(2) and knows the additional request methods 'OPTIONS', 'PUT',
-'DELETE', and 'TRACE'. You can fill in any valid web address, and the
+the request methods âHEADâ and âPOSTâ. The current version of HTTP is
+1.1,(1)(2) and knows the additional request methods âOPTIONSâ, âPUTâ,
+âDELETEâ, and âTRACEâ. You can fill in any valid web address, and the
program prints the HTML code of that page to your screen.
Notice the similarity between the responses of the POP and HTTP
@@ -1132,7 +1132,7 @@ and then you get the body of the page in HTML. The lines
of the headers
also have the same form as in POP. There is the name of a parameter,
then a colon, and finally the value of that parameter.
- Images ('.png' or '.gif' files) can also be retrieved this way, but
+ Images (â.pngâ or â.gifâ files) can also be retrieved this way, but
then you get binary data that should be redirected into a file. Another
application is calling a CGI (Common Gateway Interface) script on some
server. CGI scripts are used when the contents of a web page are not
@@ -1154,11 +1154,11 @@ obsolete by RFC 2616, an update without any substantial
changes.
(2) Version 2.0 of HTTP (https://en.wikipedia.org/wiki/HTTP/2) was
defined in RFC7540 (https://tools.ietf.org/html/rfc7540) and was derived
-from Google's SPDY (https://en.wikipedia.org/wiki/SPDY) protocol. It is
+from Googleâs SPDY (https://en.wikipedia.org/wiki/SPDY) protocol. It is
said to be widely supported. As of 2020 the most popular web sites
still identify themselves as supporting HTTP/1.1. Version 3.0 of HTTP
(https://en.wikipedia.org/wiki/HTTP/3) is still a draft and was derived
-from Google's QUIC (https://en.wikipedia.org/wiki/QUIC) protocol.
+from Googleâs QUIC (https://en.wikipedia.org/wiki/QUIC) protocol.
File: gawkinet.info, Node: Primitive Service, Next: Interacting Service,
Prev: Web page, Up: Using Networking
@@ -1167,12 +1167,12 @@ File: gawkinet.info, Node: Primitive Service, Next:
Interacting Service, Prev
===========================
Now we know enough about HTTP to set up a primitive web service that
-just says '"Hello, world"' when someone connects to it with a browser.
+just says â"Hello, world"â when someone connects to it with a browser.
Compared to the situation in the preceding node, our program changes the
role. It tries to behave just like the server we have observed. Since
we are setting up a server here, we have to insert the port number in
-the 'localport' field of the special file name. The other two fields
-(HOSTNAME and REMOTEPORT) have to contain a '0' because we do not know
+the âlocalportâ field of the special file name. The other two fields
+(HOSTNAME and REMOTEPORT) have to contain a â0â because we do not know
in advance which host will connect to our service.
In the early 1990s, all a server had to do was send an HTML document
@@ -1191,7 +1191,7 @@ The steps are as follows:
bytes will be sent. The header is terminated as usual with an
empty line.
- 3. Send the '"Hello, world"' body in HTML. The useless 'while' loop
+ 3. Send the â"Hello, world"â body in HTML. The useless âwhileâ loop
swallows the request of the browser. We could actually omit the
loop, and on most machines the program would still work. First,
start the following program:
@@ -1215,7 +1215,7 @@ The steps are as follows:
point to <http://localhost:8080> (the browser needs to know on which
port our server is listening for requests). If this does not work, the
browser probably tries to connect to a proxy server that does not know
-your machine. If so, change the browser's configuration so that the
+your machine. If so, change the browserâs configuration so that the
browser does not try to use a proxy to connect to your machine.
@@ -1233,13 +1233,13 @@ Applications and Techniques::.
* CGI Lib:: A simple CGI library.
Setting up a web service that allows user interaction is more
-difficult and shows us the limits of network access in 'gawk'. In this
-node, we develop a main program (a 'BEGIN' pattern and its action) that
+difficult and shows us the limits of network access in âgawkâ. In this
+node, we develop a main program (a âBEGINâ pattern and its action) that
will become the core of event-driven execution controlled by a graphical
user interface (GUI). Each HTTP event that the user triggers by some
action within the browser is received in this central procedure.
Parameters and menu choices are extracted from this request, and an
-appropriate measure is taken according to the user's choice:
+appropriate measure is taken according to the userâs choice:
BEGIN {
if (MyHost == "") {
@@ -1289,7 +1289,7 @@ appropriate measure is taken according to the user's
choice:
This web server presents menu choices in the form of HTML links.
Therefore, it has to tell the browser the name of the host it is
residing on. When starting the server, the user may supply the name of
-the host from the command line with 'gawk -v MyHost="Rumpelstilzchen"'.
+the host from the command line with âgawk -v MyHost="Rumpelstilzchen"â.
If the user does not do this, the server looks up the name of the host
it is running on for later use as a web address in HTML documents. The
same applies to the port number. These values are inserted later into
@@ -1297,7 +1297,7 @@ the HTML content of the web pages to refer to the home
system.
Each server that is built around this core has to initialize some
application-dependent variables (such as the default home page) in a
-function 'SetUpServer()', which is called immediately before entering
+function âSetUpServer()â, which is called immediately before entering
the infinite loop of the server. For now, we will write an instance
that initiates a trivial interaction. With this home page, the client
user can click on two possible choices, and receive the current date
@@ -1316,13 +1316,13 @@ either in human-readable format or in seconds since
1970:
On the first run through the main loop, the default line terminators
are set and the default home page is copied to the actual home page.
-Since this is the first run, 'GETARG["Method"]' is not initialized yet,
+Since this is the first run, âGETARG["Method"]â is not initialized yet,
hence the case selection over the method does nothing. Now that the
home page is initialized, the server can start communicating to a client
browser.
It does so by printing the HTTP header into the network connection
-('print ... |& HttpService'). This command blocks execution of the
+(âprint ... |& HttpServiceâ). This command blocks execution of the
server script until a client connects.
If you compare this server script with the primitive one we wrote
@@ -1336,15 +1336,15 @@ always displaying the same time of day although time
advances each
second.
Having supplied the initial home page to the browser with a valid
-document stored in the parameter 'Prompt', it closes the connection and
+document stored in the parameter âPromptâ, it closes the connection and
waits for the next request. When the request comes, a log line is
printed that allows us to see which request the server receives. The
-final step in the loop is to call the function 'CGI_setup()', which
+final step in the loop is to call the function âCGI_setup()â, which
reads all the lines of the request (coming from the browser), processes
-them, and stores the transmitted parameters in the array 'PARAM'. The
+them, and stores the transmitted parameters in the array âPARAMâ. The
complete text of these application-independent functions can be found in
*note A Simple CGI Library: CGI Lib. For now, we use a simplified
-version of 'CGI_setup()':
+version of âCGI_setup()â:
function CGI_setup( method, uri, version, i) {
delete GETARG; delete MENU; delete PARAM
@@ -1370,26 +1370,26 @@ version of 'CGI_setup()':
of request parameters. The rest of the function serves the purpose of
filling the global parameters with the extracted new values. To
accomplish this, the name of the requested resource is split into parts
-and stored for later evaluation. If the request contains a '?', then
+and stored for later evaluation. If the request contains a â?â, then
the request has CGI variables seamlessly appended to the web address.
-Everything in front of the '?' is split up into menu items, and
-everything behind the '?' is a list of 'VARIABLE=VALUE' pairs (separated
-by '&') that also need splitting. This way, CGI variables are isolated
+Everything in front of the â?â is split up into menu items, and
+everything behind the â?â is a list of âVARIABLE=VALUEâ pairs
(separated
+by â&â) that also need splitting. This way, CGI variables are isolated
and stored. This procedure lacks recognition of special characters that
are transmitted in coded form(1). Here, any optional request header and
body parts are ignored. We do not need header parameters and the
request body. However, when refining our approach or working with the
-'POST' and 'PUT' methods, reading the header and body becomes
+âPOSTâ and âPUTâ methods, reading the header and body becomes
inevitable. Header parameters should then be stored in a global array
as well as the body.
On each subsequent run through the main loop, one request from a
-browser is received, evaluated, and answered according to the user's
+browser is received, evaluated, and answered according to the userâs
choice. This can be done by letting the value of the HTTP method guide
-the main loop into execution of the procedure 'HandleGET()', which
-evaluates the user's choice. In this case, we have only one
+the main loop into execution of the procedure âHandleGET()â, which
+evaluates the userâs choice. In this case, we have only one
hierarchical level of menus, but in the general case, menus are nested.
-The menu choices at each level are separated by '/', just as in file
+The menu choices at each level are separated by â/â, just as in file
names. Notice how simple it is to construct menus of arbitrary depth:
function HandleGET() {
@@ -1402,18 +1402,18 @@ names. Notice how simple it is to construct menus of
arbitrary depth:
The disadvantage of this approach is that our server is slow and can
handle only one request at a time. Its main advantage, however, is that
-the server consists of just one 'gawk' program. No need for installing
-an 'httpd', and no need for static separate HTML files, CGI scripts, or
-'root' privileges. This is rapid prototyping. This program can be
+the server consists of just one âgawkâ program. No need for installing
+an âhttpdâ, and no need for static separate HTML files, CGI scripts, or
+ârootâ privileges. This is rapid prototyping. This program can be
started on the same host that runs your browser. Then let your browser
point to <http://localhost:8080>.
It is also possible to include images into the HTML pages. Most
-browsers support the not very well-known '.xbm' format, which may
+browsers support the not very well-known â.xbmâ format, which may
contain only monochrome pictures but is an ASCII format. Binary images
are possible but not so easy to handle. Another way of including images
is to generate them with a tool such as GNUPlot, by calling the tool
-with the 'system()' function or through a pipe.
+with the âsystem()â function or through a pipe.
---------- Footnotes ----------
@@ -1426,27 +1426,27 @@ File: gawkinet.info, Node: CGI Lib, Prev: Interacting
Service, Up: Interactin
--------------------------
HTTP is like being married: you have to be able to handle whatever
- you're given, while being very careful what you send back.
- -- _Phil Smith III,
+ youâre given, while being very careful what you send back.
+ â _Phil Smith III,
<http://www.netfunny.com/rhf/jokes/99/Mar/http.html>_
In *note A Web Service with Interaction: Interacting Service, we saw
-the function 'CGI_setup()' as part of the web server "core logic"
+the function âCGI_setup()â as part of the web server âcore logicâ
framework. The code presented there handles almost everything necessary
-for CGI requests. One thing it doesn't do is handle encoded characters
-in the requests. For example, an '&' is encoded as a percent sign
-followed by the hexadecimal value: '%26'. These encoded values should
+for CGI requests. One thing it doesnât do is handle encoded characters
+in the requests. For example, an â&â is encoded as a percent sign
+followed by the hexadecimal value: â%26â. These encoded values should
be decoded. Following is a simple library to perform these tasks. This
code is used for all web server examples throughout the rest of this
Info file. If you want to use it for your own web server, store the
-source code into a file named 'inetlib.awk'. Then you can include these
+source code into a file named âinetlib.awkâ. Then you can include these
functions into your code by placing the following statement into your
program (on the first line of your script):
@include inetlib.awk
But beware, this mechanism is only possible if you invoke your web
-server script with 'igawk' instead of the usual 'awk' or 'gawk'. Here
+server script with âigawkâ instead of the usual âawkâ or âgawkâ.
Here
is the code:
# CGI Library and core of a web server
@@ -1531,10 +1531,10 @@ is the code:
MENU[i] = _CGI_decode(MENU[i])
}
- This isolates details in a single function, 'CGI_setup()'. Decoding
+ This isolates details in a single function, âCGI_setup()â. Decoding
of encoded characters is pushed off to a helper function,
-'_CGI_decode()'. The use of the leading underscore ('_') in the
-function name is intended to indicate that it is an "internal" function,
+â_CGI_decode()â. The use of the leading underscore (â_â) in the
+function name is intended to indicate that it is an âinternalâ function,
although there is nothing to enforce this:
function _CGI_decode(str, hexdigs, i, pre, code1, code2,
@@ -1567,10 +1567,10 @@ although there is nothing to enforce this:
This works by splitting the string apart around an encoded character.
The two digits are converted to lowercase characters and looked up in a
-string of hex digits. Note that '0' is not in the string on purpose;
-'index()' returns zero when it's not found, automatically giving the
+string of hex digits. Note that â0â is not in the string on purpose;
+âindex()â returns zero when itâs not found, automatically giving the
correct value! Once the hexadecimal value is converted from characters
-in a string into a numerical value, 'sprintf()' converts the value back
+in a string into a numerical value, âsprintf()â converts the value back
into a real character. The following is a simple test harness for the
above functions:
@@ -1590,21 +1590,21 @@ above functions:
And this is the result when we run it:
$ gawk -f testserv.awk
- -| MENU["4"] = www.gnu.org
- -| MENU["5"] = cgi-bin
- -| MENU["6"] = foo
- -| MENU["1"] = http
- -| MENU["2"] =
- -| MENU["3"] =
- -| PARAM["1"] = p1=stuff
- -| PARAM["2"] = p2=stuff&junk
- -| PARAM["3"] = percent=a % sign
- -| GETARG["p1"] = stuff
- -| GETARG["percent"] = a % sign
- -| GETARG["p2"] = stuff&junk
- -| GETARG["Method"] = GET
- -| GETARG["Version"] = 1.0
- -| GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff&
+ ⣠MENU["4"] = www.gnu.org
+ ⣠MENU["5"] = cgi-bin
+ ⣠MENU["6"] = foo
+ ⣠MENU["1"] = http
+ ⣠MENU["2"] =
+ ⣠MENU["3"] =
+ ⣠PARAM["1"] = p1=stuff
+ ⣠PARAM["2"] = p2=stuff&junk
+ ⣠PARAM["3"] = percent=a % sign
+ ⣠GETARG["p1"] = stuff
+ ⣠GETARG["percent"] = a % sign
+ ⣠GETARG["p2"] = stuff&junk
+ ⣠GETARG["Method"] = GET
+ ⣠GETARG["Version"] = 1.0
+ ⣠GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff&
p2=stuff%26junk&percent=a %25 sign
@@ -1615,7 +1615,7 @@ File: gawkinet.info, Node: Simple Server, Next:
Caveats, Prev: Interacting Se
In the preceding node, we built the core logic for event-driven GUIs.
In this node, we finally extend the core to a real application. No one
-would actually write a commercial web server in 'gawk', but it is
+would actually write a commercial web server in âgawkâ, but it is
instructive to see that it is feasible in principle.
The application is ELIZA, the famous program by Joseph Weizenbaum
@@ -1646,19 +1646,19 @@ and append the following code:
TopFooter = "</BODY></HTML>"
}
- 'SetUpServer()' is similar to the previous example, except for
-calling another function, 'SetUpEliza()'. This approach can be used to
+ âSetUpServer()â is similar to the previous example, except for
+calling another function, âSetUpEliza()â. This approach can be used to
implement other kinds of servers. The only changes needed to do so are
-hidden in the functions 'SetUpServer()' and 'HandleGET()'. Perhaps it
-might be necessary to implement other HTTP methods. The 'igawk' program
-that comes with 'gawk' may be useful for this process.
+hidden in the functions âSetUpServer()â and âHandleGET()â. Perhaps it
+might be necessary to implement other HTTP methods. The âigawkâ program
+that comes with âgawkâ may be useful for this process.
When extending this example to a complete application, the first
-thing to do is to implement the function 'SetUpServer()' to initialize
+thing to do is to implement the function âSetUpServer()â to initialize
the HTML pages and some variables. These initializations determine the
way your HTML pages look (colors, titles, menu items, etc.).
- The function 'HandleGET()' is a nested case selection that decides
+ The function âHandleGET()â is a nested case selection that decides
which page the user wants to see next. Each nesting level refers to a
menu level of the GUI. Each case implements a certain action of the
menu. At the deepest level of case selection, the handler essentially
@@ -1699,7 +1699,7 @@ Initially the user does not say anything; then ELIZA
resets its money
counter and asks the user to tell what comes to mind open-heartedly.
The subsequent answers are converted to uppercase characters and stored
for later comparison. ELIZA presents the bill when being confronted
-with a sentence that contains the phrase "shut up." Otherwise, it looks
+with a sentence that contains the phrase âshut up.â Otherwise, it looks
for keywords in the sentence, conjugates the rest of the sentence,
remembers the keyword for later use, and finally selects an answer from
the set of possible answers:
@@ -1747,9 +1747,9 @@ the set of possible answers:
return answer
}
- In the long but simple function 'SetUpEliza()', you can see tables
-for conjugation, keywords, and answers.(1) The associative array 'k'
-contains indices into the array of answers 'r'. To choose an answer,
+ In the long but simple function âSetUpEliza()â, you can see tables
+for conjugation, keywords, and answers.(1) The associative array âkâ
+contains indices into the array of answers ârâ. To choose an answer,
ELIZA just picks an index randomly:
function SetUpEliza() {
@@ -1786,8 +1786,8 @@ ELIZA just picks an index randomly:
}
Some interesting remarks and details (including the original source
-code of ELIZA) are found on Mark Humphrys's home page 'How my program
-passed the Turing Test' (https://computing.dcu.ie/~humphrys/eliza.html).
+code of ELIZA) are found on Mark Humphrysâs home page âHow my program
+passed the Turing Testâ (https://computing.dcu.ie/~humphrys/eliza.html).
Wikipedia provides much background information about ELIZA
(https://en.wikipedia.org/wiki/ELIZA), including the original design of
the software and its early implementations.
@@ -1795,7 +1795,7 @@ the software and its early implementations.
---------- Footnotes ----------
(1) The version shown here is abbreviated. The full version comes
-with the 'gawk' distribution.
+with the âgawkâ distribution.
File: gawkinet.info, Node: Caveats, Next: Challenges, Prev: Simple Server,
Up: Using Networking
@@ -1809,19 +1809,19 @@ The behavior of a networked application sometimes looks
noncausal
because it is not reproducible in a strong sense. Whether a network
application works or not sometimes depends on the following:
- * How crowded the underlying network is
+ ⢠How crowded the underlying network is
- * If the party at the other end is running or not
+ ⢠If the party at the other end is running or not
- * The state of the party at the other end
+ ⢠The state of the party at the other end
The most difficult problems for a beginner arise from the hidden
-states of the underlying network. After closing a TCP connection, it's
+states of the underlying network. After closing a TCP connection, itâs
often necessary to wait a short while before reopening the connection.
Even more difficult is the establishment of a connection that previously
-ended with a "broken pipe." Those connections have to "time out" for a
+ended with a âbroken pipe.â Those connections have to âtime outâ for a
minute or so before they can reopen. Check this with the command
-'netstat -a', which provides a list of still-active connections.
+ânetstat -aâ, which provides a list of still-active connections.
File: gawkinet.info, Node: Challenges, Prev: Caveats, Up: Using Networking
@@ -1835,7 +1835,7 @@ Loebner Prize is the first formal instantiation of a
Turing Test. Hugh
Loebner agreed with The Cambridge Center for Behavioral Studies to
underwrite a contest designed to implement the Turing Test. Dr. Loebner
pledged a Grand Prize of $100,000 for the first computer whose responses
-were indistinguishable from a human's. Each year an annual prize of
+were indistinguishable from a humanâs. Each year an annual prize of
$2000 and a bronze medal is awarded to the _most_ human computer. The
winner of the annual contest is the best entry relative to other entries
that year, irrespective of how good it is in an absolute sense. Here is
@@ -1887,20 +1887,20 @@ behave so much like a human being that it can win this
prize. It is
quite common to let these programs talk to each other via network
connections. But during the competition itself, the program and its
computer have to be present at the place the competition is held. We
-all would love to see a 'gawk' program win in such an event. Maybe it
+all would love to see a âgawkâ program win in such an event. Maybe it
is up to you to accomplish this?
Some other ideas for useful networked applications:
- * Read the file 'doc/awkforai.txt' in earlier 'gawk'
+ ⢠Read the file âdoc/awkforai.txtâ in earlier âgawkâ
distributions.(1) It was written by Ronald P. Loui (at the time,
Associate Professor of Computer Science, at Washington University
in St. Louis, <loui@ai.wustl.edu>) and summarizes why he taught
- 'gawk' to students of Artificial Intelligence. Here are some
+ âgawkâ to students of Artificial Intelligence. Here are some
passages from the text:
The GAWK manual can be consumed in a single lab session and
the language can be mastered by the next morning by the
- average student. GAWK's automatic initialization, implicit
+ average student. GAWKâs automatic initialization, implicit
coercion, I/O support and lack of pointers forgive many of the
mistakes that young programmers are likely to make. Those who
have seen C but not mastered it are happy to see that GAWK
@@ -1910,17 +1910,17 @@ is up to you to accomplish this?
There are further simple answers. Probably the best is the
fact that increasingly, undergraduate AI programming is
involving the Web. Oren Etzioni (University of Washington,
- Seattle) has for a while been arguing that the "softbot" is
- replacing the mechanical engineers' robot as the most
+ Seattle) has for a while been arguing that the âsoftbotâ is
+ replacing the mechanical engineersâ robot as the most
glamorous AI testbed. If the artifact whose behavior needs to
be controlled in an intelligent way is the software agent,
then a language that is well-suited to controlling the
software environment is the appropriate language. That would
imply a scripting language. If the robot is KAREL, then the
- right language is "turn left; turn right." If the robot is
+ right language is âturn left; turn right.â If the robot is
Netscape, then the right language is something that can
- generate 'netscape -remote
- 'openURL(http://cs.wustl.edu/~loui)'' with elan.
+ generate ânetscape -remote
+ 'openURL(http://cs.wustl.edu/~loui)'â with elan.
...
AI programming requires high-level thinking. There have
always been a few gifted programmers who can write high-level
@@ -1934,17 +1934,17 @@ is up to you to accomplish this?
strings. A language that provides the best support for string
processing in the end provides the best support for logic, for
the exploration of various logics, and for most forms of
- symbolic processing that AI might choose to call "reasoning"
- instead of "logic." The implication is that PROLOG, which
+ symbolic processing that AI might choose to call âreasoningâ
+ instead of âlogic.â The implication is that PROLOG, which
saves the AI programmer from having to write a unifier, saves
perhaps two dozen lines of GAWK code at the expense of
strongly biasing the logic and representational expressiveness
of any approach.
- Now that 'gawk' itself can connect to the Internet, it should be
+ Now that âgawkâ itself can connect to the Internet, it should be
obvious that it is suitable for writing intelligent web agents.
- * 'awk' is strong at pattern recognition and string processing. So,
+ ⢠âawkâ is strong at pattern recognition and string processing. So,
it is well suited to the classic problem of language translation.
A first try could be a program that knows the 100 most frequent
English words and their counterparts in German or French. The
@@ -1955,9 +1955,9 @@ is up to you to accomplish this?
in return. As soon as this works, more effort can be spent on a
real translation program.
- * Another dialogue-oriented application (on the verge of ridicule) is
- the email "support service." Troubled customers write an email to
- an automatic 'gawk' service that reads the email. It looks for
+ ⢠Another dialogue-oriented application (on the verge of ridicule) is
+ the email âsupport service.â Troubled customers write an email to
+ an automatic âgawkâ service that reads the email. It looks for
keywords in the mail and assembles a reply email accordingly. By
carefully investigating the email header, and repeating these
keywords through the reply email, it is rather simple to give the
@@ -1968,7 +1968,7 @@ is up to you to accomplish this?
---------- Footnotes ----------
- (1) The file is no longer distributed with 'gawk', since the
+ (1) The file is no longer distributed with âgawkâ, since the
copyright on the file is not clear.
@@ -1981,24 +1981,24 @@ In this major node, we look at a number of
self-contained scripts, with
an emphasis on concise networking. Along the way, we work towards
creating building blocks that encapsulate often-needed functions of the
networking world, show new techniques that broaden the scope of problems
-that can be solved with 'gawk', and explore leading edge technology that
+that can be solved with âgawkâ, and explore leading edge technology that
may shape the future of networking.
We often refer to the site-independent core of the server that we
built in *note A Simple Web Server: Simple Server. When building new
and nontrivial servers, we always copy this building block and append
-new instances of the two functions 'SetUpServer()' and 'HandleGET()'.
+new instances of the two functions âSetUpServer()â and âHandleGET()â.
This makes a lot of sense, since this scheme of event-driven
-execution provides 'gawk' with an interface to the most widely accepted
-standard for GUIs: the web browser. Now, 'gawk' can rival even Tcl/Tk.
+execution provides âgawkâ with an interface to the most widely accepted
+standard for GUIs: the web browser. Now, âgawkâ can rival even Tcl/Tk.
- Tcl and 'gawk' have much in common. Both are simple scripting
+ Tcl and âgawkâ have much in common. Both are simple scripting
languages that allow us to quickly solve problems with short programs.
-But Tcl has Tk on top of it, and 'gawk' had nothing comparable up to
+But Tcl has Tk on top of it, and âgawkâ had nothing comparable up to
now. While Tcl needs a large and ever-changing library (Tk, which was
-originally bound to the X Window System), 'gawk' needs just the
-networking interface and some kind of browser on the client's side.
+originally bound to the X Window System), âgawkâ needs just the
+networking interface and some kind of browser on the clientâs side.
Besides better portability, the most important advantage of this
approach (embracing well-established standards such HTTP and HTML) is
that _we do not need to change the language_. We let others do the work
@@ -2024,20 +2024,20 @@ File: gawkinet.info, Node: PANIC, Next: GETURL,
Prev: Some Applications and T
3.1 PANIC: An Emergency Web Server
==================================
-At first glance, the '"Hello, world"' example in *note A Primitive Web
+At first glance, the â"Hello, world"â example in *note A Primitive Web
Service: Primitive Service, seems useless. By adding just a few lines,
we can turn it into something useful.
The PANIC program tells everyone who connects that the local site is
not working. When a web server breaks down, it makes a difference if
-customers get a strange "network unreachable" message, or a short
+customers get a strange ânetwork unreachableâ message, or a short
message telling them that the server has a problem. In such an
emergency, the hard disk and everything on it (including the regular web
service) may be unavailable. Rebooting the web server off a USB drive
makes sense in this setting.
To use the PANIC program as an emergency web server, all you need are
-the 'gawk' executable and the program below on a USB drive. By default,
+the âgawkâ executable and the program below on a USB drive. By default,
it connects to port 8080. A different value may be supplied on the
command line:
@@ -2070,7 +2070,7 @@ GETURL is a versatile building block for shell scripts
that need to
retrieve files from the Internet. It takes a web address as a
command-line parameter and tries to retrieve the contents of this
address. The contents are printed to standard output, while the header
-is printed to '/dev/stderr'. A surrounding shell script could analyze
+is printed to â/dev/stderrâ. A surrounding shell script could analyze
the contents and extract the text or the links. An ASCII browser could
be written around GETURL. But more interestingly, web robots are
straightforward to write on top of GETURL. On the Internet, you can find
@@ -2080,10 +2080,10 @@ usually much more complex internally and at least 10
times as big.
At first, GETURL checks if it was called with exactly one web
address. Then, it checks if the user chose to use a special proxy
server whose name is handed over in a variable. By default, it is
-assumed that the local machine serves as proxy. GETURL uses the 'GET'
+assumed that the local machine serves as proxy. GETURL uses the âGETâ
method by default to access the web page. By handing over the name of a
-different method (such as 'HEAD'), it is possible to choose a different
-behavior. With the 'HEAD' method, the user does not receive the body of
+different method (such as âHEADâ), it is possible to choose a different
+behavior. With the âHEADâ method, the user does not receive the body of
the page content, but does receive the header:
BEGIN {
@@ -2114,7 +2114,7 @@ the page content, but does receive the header:
This program can be changed as needed, but be careful with the last
lines. Make sure transmission of binary data is not corrupted by
additional line breaks. Even as it is now, the byte sequence
-'"\r\n\r\n"' would disappear if it were contained in binary data. Don't
+â"\r\n\r\n"â would disappear if it were contained in binary data. Donât
get caught in a trap when trying a quick fix on this one.
@@ -2131,27 +2131,27 @@ GNU/Linux in embedded PCs. These systems are small and
usually do not
have a keyboard or a display. Therefore it is difficult to set up their
configuration. There are several widespread ways to set them up:
- * DIP switches
+ ⢠DIP switches
- * Read Only Memories such as EPROMs
+ ⢠Read Only Memories such as EPROMs
- * Serial lines or some kind of keyboard
+ ⢠Serial lines or some kind of keyboard
- * Network connections via 'telnet' or SNMP
+ ⢠Network connections via âtelnetâ or SNMP
- * HTTP connections with HTML GUIs
+ ⢠HTTP connections with HTML GUIs
In this node, we look at a solution that uses HTTP connections to
control variables of an embedded system that are stored in a file.
Since embedded systems have tight limits on resources like memory, it is
difficult to employ advanced techniques such as SNMP and HTTP servers.
-'gawk' fits in quite nicely with its single executable which needs just
+âgawkâ fits in quite nicely with its single executable which needs just
a short script to start working. The following program stores the
variables in a file, and a concurrent process in the embedded system may
read the file. The program uses the site-independent part of the simple
web server that we developed in *note A Web Service with Interaction:
Interacting Service. As mentioned there, all we have to do is to write
-two new procedures 'SetUpServer()' and 'HandleGET()':
+two new procedures âSetUpServer()â and âHandleGET()â:
function SetUpServer() {
TopHeader = "<HTML><title>Remote Configuration</title>"
@@ -2168,18 +2168,18 @@ two new procedures 'SetUpServer()' and 'HandleGET()':
if (ConfigFile == "") ConfigFile = "config.asc"
}
- The function 'SetUpServer()' initializes the top level HTML texts as
+ The function âSetUpServer()â initializes the top level HTML texts as
usual. It also initializes the name of the file that contains the
configuration parameters and their values. In case the user supplies a
name from the command line, that name is used. The file is expected to
contain one parameter per line, with the name of the parameter in column
one and the value in column two.
- The function 'HandleGET()' reflects the structure of the menu tree as
+ The function âHandleGET()â reflects the structure of the menu tree as
usual. The first menu choice tells the user what this is all about.
The second choice reads the configuration file line by line and stores
the parameters and their values. Notice that the record separator for
-this file is '"\n"', in contrast to the record separator for HTTP. The
+this file is â"\n"â, in contrast to the record separator for HTTP. The
third menu choice builds an HTML table to show the contents of the
configuration file just read. The fourth choice does the real work of
changing parameters, and the last one just saves the configuration into
@@ -2244,15 +2244,15 @@ bookmark file with pointers to interesting web sites.
It is impossible
to regularly check by hand if any of these sites have changed. A
program is needed to automatically look at the headers of web pages and
tell which ones have changed. URLCHK does the comparison after using
-GETURL with the 'HEAD' method to retrieve the header.
+GETURL with the âHEADâ method to retrieve the header.
Like GETURL, this program first checks that it is called with exactly
one command-line parameter. URLCHK also takes the same command-line
-variables 'Proxy' and 'ProxyPort' as GETURL, because these variables are
+variables âProxyâ and âProxyPortâ as GETURL, because these variables
are
handed over to GETURL for each URL that gets checked. The one and only
parameter is the name of a file that contains one line for each URL. In
the first column, we find the URL, and the second and third columns hold
-the length of the URL's body when checked for the two last times. Now,
+the length of the URLâs body when checked for the two last times. Now,
we follow this plan:
1. Read the URLs from the file and remember their most recent lengths
@@ -2301,11 +2301,11 @@ those lines that differ in their second and third
columns:
Another thing that may look strange is the way GETURL is called.
Before calling GETURL, we have to check if the proxy variables need to
be passed on. If so, we prepare strings that will become part of the
-command line later. In 'GetHeader', we store these strings together
+command line later. In âGetHeaderâ, we store these strings together
with the longest part of the command line. Later, in the loop over the
-URLs, 'GetHeader' is appended with the URL and a redirection operator to
-form the command that reads the URL's header over the Internet. GETURL
-always sends the headers to '/dev/stderr'. That is the reason why we
+URLs, âGetHeaderâ is appended with the URL and a redirection operator to
+form the command that reads the URLâs header over the Internet. GETURL
+always sends the headers to â/dev/stderrâ. That is the reason why we
need the redirection operator to have the header piped in.
This program is not perfect because it assumes that changing URLs
@@ -2335,20 +2335,20 @@ the Bourne shell:
Notice that the regular expression for URLs is rather crude. A
precise regular expression is much more complex. But this one works
rather well. One problem is that it is unable to find internal links of
-an HTML document. Another problem is that 'ftp', 'telnet', 'news',
-'mailto', and other kinds of links are missing in the regular
+an HTML document. Another problem is that âftpâ, âtelnetâ, ânewsâ,
+âmailtoâ, and other kinds of links are missing in the regular
expression. However, it is straightforward to add them, if doing so is
necessary for other tasks.
This program reads an HTML file and prints all the HTTP links that it
-finds. It relies on 'gawk''s ability to use regular expressions as the
-record separator. With 'RS' set to a regular expression that matches
+finds. It relies on âgawkââs ability to use regular expressions as the
+record separator. With âRSâ set to a regular expression that matches
links, the second action is executed each time a non-empty link is
-found. We can find the matching link itself in 'RT'.
+found. We can find the matching link itself in âRTâ.
- The action could use the 'system()' function to let another GETURL
+ The action could use the âsystem()â function to let another GETURL
retrieve the page, but here we use a different approach. This simple
-program prints shell commands that can be piped into 'sh' for execution.
+program prints shell commands that can be piped into âshâ for execution.
This way it is possible to first extract the links, wrap shell commands
around them, and pipe all the shell commands into a file. After editing
the file, execution of the file retrieves only those files that we
@@ -2358,10 +2358,10 @@ pages like this:
gawk -f geturl.awk http://www.suse.de | gawk -f webgrab.awk | sh
After this, you will find the contents of all referenced documents in
-files named 'doc*.html' even if they do not contain HTML code. The most
+files named âdoc*.htmlâ even if they do not contain HTML code. The most
annoying thing is that we always have to pass the proxy to GETURL. If
you do not like to see the headers of the web pages appear on the
-screen, you can redirect them to '/dev/null'. Watching the headers
+screen, you can redirect them to â/dev/nullâ. Watching the headers
appear can be quite interesting, because it reveals interesting details
such as which web server the companies use. Now, it is clear how the
clever marketing people use web robots to determine the market shares of
@@ -2371,11 +2371,11 @@ Microsoft and Netscape in the web server market.
firewall. After attaching a browser to port 80, we usually catch a
glimpse of the bright side of the server (its home page). With a tool
like GETURL at hand, we are able to discover some of the more concealed
-or even "indecent" services (i.e., lacking conformity to standards of
+or even âindecentâ services (i.e., lacking conformity to standards of
quality). It can be exciting to see the fancy CGI scripts that lie
there, revealing the inner workings of the server, ready to be called:
- * With a command such as:
+ ⢠With a command such as:
gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/
@@ -2386,18 +2386,18 @@ there, revealing the inner workings of the server,
ready to be called:
If there are subdirectories with configuration data of the web
server, this can also be quite interesting to read.
- * The well-known Apache web server usually has its CGI files in the
- directory '/cgi-bin'. There you can often find the scripts
- 'test-cgi' and 'printenv'. Both tell you some things about the
+ ⢠The well-known Apache web server usually has its CGI files in the
+ directory â/cgi-binâ. There you can often find the scripts
+ âtest-cgiâ and âprintenvâ. Both tell you some things about the
current connection and the installation of the web server. Just
call:
gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/test-cgi
gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/printenv
- * Sometimes it is even possible to retrieve system files like the web
- server's log file--possibly containing customer data--or even the
- file '/etc/passwd'. (We don't recommend this!)
+ ⢠Sometimes it is even possible to retrieve system files like the web
+ serverâs log fileâpossibly containing customer dataâor even the
+ file â/etc/passwdâ. (We donât recommend this!)
*Caution:* Although this may sound funny or simply irrelevant, we are
talking about severe security holes. Try to explore your own system
@@ -2435,28 +2435,28 @@ File: gawkinet.info, Node: STATIST, Next: MAZE,
Prev: WEBGRAB, Up: Some Appl
-10 5 0 5 10"