bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gzip --force bug


From: Jim Meyering
Subject: Re: gzip --force bug
Date: Wed, 03 Feb 2010 06:30:44 +0100

Mark Adler wrote:
> I got a report of a behavior of gzip that is not replicated in pigz.  In the 
> process of investigating that, I found a bug in gzip (all versions including 
> 1.4).  Here's the deal.
>
> The behavior is that if you use --force and --stdout with --decompress, gzip 
> will behave like cat if it doesn't recognize any compressed data magic 
> headers.  This is so that zcat can act as a replacement for cat, 
> automatically detecting and decompressing compressed data.  (pigz doesn't 
> currently do that, which I need to fix.)  Another behavior of gzip is that it 
> will decompress concatenated gzip streams.  Combining those two behaviors, 
> gzip -cfd on a gzip stream followed by non-gzip data should give you the 
> decompressed data from the stream followed by the non-gzip data copied.
>
> gzip doesn't do that, at least not correctly.
>
> What it does for a small example is write the decompressed data, write the 
> initial gzip stream without decompressing it (!), and then write the non-gzip 
> data.  The stuff in the middle is the result of this code in gzip.c:
>
>    } else if (force && to_stdout && !list) { /* pass input unchanged */
>       method = STORED;
>       work = copy;
>        inptr = 0;
>       last_member = 1;
>    }
>
> (By the way, the tabs should be removed from all of the gzip source code.)
>
> The culprit is the "inptr = 0".  It resets the input back to the beginning of 
> the current input buffer (wherever that happens to be) and copies from there. 
>  That works fine if you start the input with non-gzip data, but messes up in 
> the case of non-gzip data after a gzip stream.
>
> I have not developed a fix, since it is non-trivial.  You can't just restore 
> a saved inptr, since it is possible for the two-byte magic header to be split 
> on a buffer boundary.  That is, reading the first byte of the magic header 
> empties the input buffer, so that reading the second byte of the magic reader 
> fills the input buffer, overwriting the first byte.
>
> If you want, I can try to come up with a patch for that, or you could have 
> that pleasure.

Thanks for the report.
I'm adding a test to exercise that, currently expected to fail:

>From 026eb1815d339e73102e3ae5a61543049ae9423a Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 2 Feb 2010 08:19:36 +0100
Subject: [PATCH 1/2] gzip -cdf mishandles some concatenated input streams: test 
it

* tests/mixed: Exercise "gzip -cdf" bug.
* Makefile.am (XFAIL_TESTS): Add it.
Mark Adler reported the bug.
---
 Makefile.am |    3 +++
 tests/mixed |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 0 deletions(-)
 create mode 100644 tests/mixed

diff --git a/Makefile.am b/Makefile.am
index b4e75fc..4263b1d 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -99,6 +99,9 @@ check-local: $(FILES_TO_CHECK) $(bin_PROGRAMS) gzip.doc.gz
        done
        @echo 'Test succeeded.'

+XFAIL_TESTS =                                  \
+  tests/mixed
+
 TESTS =                                                \
   tests/helin-segv                             \
   tests/hufts                                  \
diff --git a/tests/mixed b/tests/mixed
new file mode 100644
index 0000000..0ca8e80
--- /dev/null
+++ b/tests/mixed
@@ -0,0 +1,52 @@
+#!/bin/sh
+# Ensure that gzip -cdf handles mixed compressed/not-compressed data
+# Before gzip-1.5, it would produce invalid output.
+
+# Copyright (C) 2010 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+# limit so don't run it by default.
+
+if test "$VERBOSE" = yes; then
+  set -x
+  zgrep --version
+fi
+
+: ${srcdir=.}
+. "$srcdir/tests/init.sh"
+
+printf 'xxx\nyyy\n'      > exp2 || framework_failure
+printf 'aaa\nbbb\nccc\n' > exp3 || framework_failure
+
+fail=0
+
+(echo xxx; echo yyy) > in || fail=1
+gzip -cdf < in > out || fail=1
+compare out exp2 || fail=1
+
+# Uncompressed input, followed by compressed data.
+(echo xxx; echo yyy|gzip) > in || fail=1
+gzip -cdf < in > out || fail=1
+compare out exp2 || fail=1
+
+# Compressed input, followed by regular (not-compressed) data.
+(echo xxx|gzip; echo yyy) > in || fail=1
+gzip -cdf < in > out || fail=1
+compare out exp2 || fail=1
+
+(echo xxx|gzip; echo yyy|gzip) > in || fail=1
+gzip -cdf < in > out || fail=1
+compare out exp2 || fail=1
+
+Exit $fail
--
1.7.0.rc1.167.gdb08




reply via email to

[Prev in Thread] Current Thread [Next in Thread]