[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Understanding lzma-302eos
From: |
Antonio Diaz Diaz |
Subject: |
Re: Understanding lzma-302eos |
Date: |
Wed, 24 Aug 2022 01:45:33 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 |
Hi Hoël,
Hoël Bézier wrote:
One thing (amongst many!) that I fail to figure out is why the range
decoder skips the first five bytes of the lzma stream. This happens in
the Range_decoder constructor in lzd code:
Range_decoder() : member_pos( 6 ), code( 0 ), range( 0xFFFFFFFFU )
{
for( int i = 0; i < 5; ++i ) code = ( code << 8 ) | get_byte();
}
Note that the code above does not "skip the first five bytes"; it shifts 5
bytes into 'code', of which the last 4 remain in 'code' after the shifting.
It is equivalent to:
get_byte();
for( int i = 0; i < 4; ++i ) code = ( code << 8 ) | get_byte();
This is also confirmed by the ietf draft:
The range encoder produces a first 0 byte that must be ignored by the
range decoder. This is done by shifting 5 bytes in the
initialization of 'code' instead of 4.
Note "shifting", not "skipping". BTW, the first 0 byte is the contents of
'cache'. See line 234 of encoder_base.h in the source of lzip-1.23. Any
value you initialize 'cache' to will be copied in the compressed file, but
it will not affect the decoding.
On a side note, this code snippet shows that the first five bytes are
used to update the code, which is the current point in the range,
according to the ietf draft, but range is not updated.
The constructor initializes 'range' to its initial value; the maximum range
possible. Then loads into 'code' the 4 most significant bytes of the initial
point in that range as produced by the encoding of the first bytes of data.
From there on, 'range' is multiplied by 256 and a new compressed byte is
shifted into 'code' each time 'range' falls below 0x01000000. (See 'decode'
and 'decode_bit').
Best regards,
Antonio.