[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lzip-bug] reducing memory usage when decompressing
From: |
Antonio Diaz Diaz |
Subject: |
Re: [Lzip-bug] reducing memory usage when decompressing |
Date: |
Sat, 06 Dec 2008 19:04:01 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.11) Gecko/20050905 |
Hello John,
John Reiser wrote:
When decompressing a compressed file, it seems to me that there is no benefit
to having a decompression buffer that is larger than the original file,
because any match distance (for copying) must be less. The decompressor
could save space by allocating only
min((1<<dictionary_bits), uncompressed_size)
bytes for the buffer. It is somewhat unfortunate that the uncompressed size
appears not in the header of the compressed data, but only in the trailer.
Gzip also stores the uncompressed size in the trailer because counting
the bytes is the only way of being sure the size is correct.
The usage model of limiting the size of the decompression buffer, but
still allowing the compressor to achieve tight compression by using
larger arrays for the probability model, longer than
(1 << ceil(log2(uncompressed_size))) ,
is also attractive. However, lzip has coupled together the buffer size
and the model size during compression.
It would be attractive if one could know the uncompressed size in
advance. Also note that I have not found any file that needed an array
more than 2 times its size to achieve maximum compression, so they are
probably rare.
What are your thoughts about reducing memory usage when decompressing,
and allowing a model size that is independent of buffer size?
The only two ways I see of reducing memory usage when decompressing are,
being careful when compressing or, overriding the dictionary size stored
in the file header with a command line option.
Allowing an array size independent of buffer size can make lzip slower
and more complex. I would like to see proof that larger arrays improve
compression significantly before implementing it.
Best regards,
Antonio.