bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] gawk Regression: CR characters are not stripped on Windows


From: Orgad Shaneh
Subject: [bug-gawk] gawk Regression: CR characters are not stripped on Windows
Date: Tue, 27 Feb 2018 09:22:18 +0200

Hi,

Cross-posting per Eli Zaretskii's request.

CR characters used to be automatically stripped on Windows (MSYS2 and
Cygwin environments). This is broken in 4.2.0.

Minimal example:
echo -en "foo\r\n\r\nbar\r\n" > foo.txt
awk '/^$/ { print "found" }' foo.txt # This worked with 4.1.4 and
doesn't work with 4.2.0
awk '/^\r$/ { print "found" }' foo.txt # This works with 4.2.0 and
doesn't work with 4.1.4

Bisected to commit 5db38f775d9ba239e125d81dff2010a2ddacb48e:
(* gawkmisc.c (cygwin_premain0, cygwin_premain2): Remove.
No longer needed).

Apparently it's still needed...

This issue was reported in https://github.com/git-for-windows/git/issues/1524

Proposed patch is attached.

As Eli said, this change was deliberate. But this has several drawbacks.

1. The gawk info page states that:

> Under MS-Windows, 'gawk' (and many other text programs) silently
> translates end-of-line '\r\n' to '\n' on input and '\n' to '\r\n' on
> output.

and on Feb 8 the following section was added:

> Recent versions of Cygwin open all files in binary mode.  This means
> that you should use 'RS = "\r?\n"' in order to be able to handle
> standard MS-Windows text files with carriage-return plus line-feed line
> endings.

This breaks compatibility between different gawk versions. What were
the reasons for this change in cygwin, and why was it pushed upstream?

2. Git and other tools automatically convert text files to CRLF on
Windows. This means that any awk script that runs on both platforms
must use RS = "\r?\n". One example that was broken by this behavior
change is gerrit's commit-msg hook[1], which scans for empty lines by
/^$/ regexp.

Please consider reverting this change. Patch attached.

[1] 
https://gerrit.googlesource.com/gerrit/+/376a7bbb64f1b3f13c261f4efa0af0e8538cfe9b/resources/com/google/gerrit/server/tools/root/hooks/commit-msg#101

- Orgad

Attachment: 0001-Revert-default-mode-on-Cygwin-from-binary-back-to-te.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]