bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ETL for Flexcab Sample Requests failed with file size error


From: Steven M. Schweda
Subject: Re: ETL for Flexcab Sample Requests failed with file size error
Date: Fri, 27 Jul 2012 02:22:54 -0500 (CDT)

From: "Techow, Ric" <address@hidden>

> What is happening is that files on Zos are being zipped and shipped to a So=
> laris platform as a data feed.  Zos PKZIP has an option to generate a gzip =
> format compressed file.  [...]

   Are you sure about that, or does PKZIP generate a plain-old ZIP
archive, which, if it contains only one file, is something with which
gzip can cope?  For example, on a handy VMS system:

alp $ zip31l gzt_1.gz TEST.C        ! Create a ZIP archive with one member.
  adding: TEST.C (deflated 60%)

alp $ zip31l gzt_2.gz TEST.C wd2.c  ! Create a ZIP archive with two members.
  adding: TEST.C (deflated 60%)
  adding: wd2.c (deflated 62%)

[See note below, [*].]

alp $ gzip -d gzt_1.gz          ! Use gzip to expand the one-member archive.
alp $ gdiff -s gzt_1 TEST.C
Files gzt_1 and TEST.C are identical

alp $ gzip -d gzt_2.gz          ! Try gzip to expand the two-member archive.
gzip: ALP$DKC0:[SMS.ITRC]gzt.gz;1 has more than one entry -- unchanged

   As I said before, "quite limited".

>  So we have done that for many years - worked fine =
> till recently.

   Well, you've done something, but it may not be exactly what you
thought you were doing.  I, of course, know nothing of what you actually
told PKZIP to do, nor all of what PKZIP _can_ do, so I'm just
speculating.

> It appears that when Zos PKZIP encounters a file > 4 GB changes format to Z=
> IP64 - (should have abended instead of doing that without letting folk know=
> ).  Solaris gzip has been failing). =20

   If PKZIP is making a ZIP archive (which would be my guess), then it's
probably doing exactly what it should be doing, which is switching to
the large-file (ZIP64) format (which is a ZIP archive format, not a
gzip-compressed-file format) when it sees a large file.  (Info-ZIP Zip
defines "large" as 2GB-and-up, not 4GB-and-up, because we deal with too
many systems where file offsets are signed, and it's too much work to
try to deal with values >= 2GB in 32-bit numbers.)  Part of the ZIP64
format involves storing -1 (0xffffffff) in a bunch of the conventional,
32-bit file size and offset fields, and then storing the real, 64-bit
values somewhere else (in places where gzip probably has no interest in
looking).  These -1 values can confuse a program which was expecting the
actual file size and offset values in those fields.  To me, it's all
plausible, but I'm always open to actual evidence, or a good
counter-argument.

      http://www.pkware.com/documents/casestudies/APPNOTE.TXT

   Hint:  While gzip can decode a simple ZIP archive, UnZip can _not_
decode a gzip-compressed file (because it's not a ZIP archive).  If
UnZip can process your files, then they're really ZIP archives, not
simple gzip-compressed files.  For example:

alp $ copy TEST.C fred.c
alp $ gzip fred.c
alp $ unzip61l -t fred.c-gz
Archive:  ALP$DKC0:[SMS.ITRC]fred.c-gz;1
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
[...]

That mess is the standard "not a ZIP archive" message.  On a ZIP
archive, it does better (and often better than gzip):

alp $ unzip61l -t gzt_2.gz
Archive:  ALP$DKC0:[SMS.ITRC]gzt_2.gz;1
    testing: TEST.C                   OK
    testing: wd2.c                    OK
No errors detected in compressed data of ALP$DKC0:[SMS.ITRC]gzt_2.gz;1.

> I'll try the info-ziplink you have suggested.

   Wake me when it catches fire.  I'd start with some non-z/OS system
(like, say, Solaris), where things are known to work easily, and then
move on to z/OS when more is known about all the data formats in use.

   On the bright side, if you use ZIP tools (PKZIP, Info-ZIP Zip and
UnZip, ...) to work with ZIP archives, then you can evade that
one-file-per-archive limit imposed by gzip.  (ZIP archivers are
functionally similar to a "tar"+gzip combination.  Not a direct
replacement, though -- different formats.)


   [*] Note that naming a ZIP archive "something.gz" does not make it a
gzip-compressed file.

   http://www.google.com/search?q=if+you+call+a+tail+a+leg

------------------------------------------------------------------------

   Steven M. Schweda               address@hidden
   382 South Warwick Street        (+1) 651-699-9818
   Saint Paul  MN  55105-2547



reply via email to

[Prev in Thread] Current Thread [Next in Thread]