bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25707: [PATCH] grep: don't forcefully strip carriage returns


From: Eric Blake
Subject: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Mon, 13 Feb 2017 14:20:56 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0

On 02/13/2017 02:00 PM, Paul Eggert wrote:
> On 02/13/2017 11:23 AM, Eric Blake wrote:
>> the use of fopen("rt") actively
>> breaks assumptions on a binary mount by silently corrupting any
>> carriage returns that are supposed to be preserved.
> 
> Surely it's more typical for trailing CRs to be ignored rather than be
> preserved, even on binary mounts.

On binary mounts, it is atypical to have a file with trailing CRs in the
first place - but when they are there, they are intentional. Stripping a
trailing CR on a file that was placed in a binary mount in order to
preserve the CR is what is wrong.

> 
> The intent of that "rt" (introduced on 2014-04-11 in commit
> 5c92a54ec36f176854f0b0cca83b5ff224f2d8e8) was to simplify the code,
> which formerly used undossify_input to guess whether a pattern file had
> trailing CRs. If we change "rt" to "r", surely we'd need to reintroduce
> undossify_input; and wouldn't undossify_input also "silently corrupt"
> carriage returns in most cases?

No, we don't need undossify_input.  Generally, the pattern file is
already clean (using fopen("r") on a text mount already got rid of all
CR, and using fopen("r") on a binary mount shouldn't have CR in the
first place).  And in the rare cases where you DO have CR in the pattern
file on a binary mount, then it is probably intended that you are
searching for CR as part of the pattern (if not, the user should use a
purpose-built applications, like d2u (dos2unix), to clean up the pattern
file before passing that file to 'grep -f').

You are correct that undossify_input would silently corrupt carriage
returns, but I'm arguing that we don't ever want to be in the business
of removing CR (either the OS has already removed them on our behalf
because of the text mount, or the user doesn't want them removed because
of the binary mount).

Years ago, Cygwin actively advertised that you could create text mount
points.  It generated a LOT of mailing list traffic about various
different apps that misbehaved in subtly different ways (not the least
of these is the fact that lseek() on a file opened in text mode is prone
to return the WRONG offset, which makes seeking back to the last-used
byte unreliable in applications that only partially consume input while
still complying with the POSIX rule that the next app to share stdin
picks up where the first one left off).  It's now been more than 6 years
since Cygwin has actively discouraged new installations from having a
text mount (they are still possible, but no longer a default
installation choice), and there have actually been far fewer complaints
about binary-only mounts misbehaving, except in programs like grep that
eat CR from a file that is supposed to be preserved in binary mode.

> 
> While we're on the topic, the undossify_input approach is just a
> heuristic and it sometimes guesses wrong. I wish the heuristic could be
> removed somehow, so that grep would behave more deterministically on
> MS-DOS/Windows.
> 

I'm of the opinion that undossify_input causes more problems than it
solves.  We should trust fopen("r") to do the right thing, rather than
reinventing it ourselves.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]