[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lzip-bug] Version 0.1 of plzip released
From: |
Ersek, Laszlo |
Subject: |
Re: [Lzip-bug] Version 0.1 of plzip released |
Date: |
Thu, 10 Dec 2009 00:52:52 +0100 (CET) |
On Wed, 9 Dec 2009, John Reiser wrote:
Version 0.1 of plzip uses a ./configure script to generate ./Makefile.
The configure could pre-compute the various sizeof(<type>) that are
involved in the overflow checks [...] Either way, any test which deals
with size-based overflow, and whose results are pre-determined solely on
the basis of the types involved, should take advantage of those facts.
I am not debating this wrt. to plzip, since, as you say, it does use
a ./configure script.
If constant values that result from analysis of types and type ranges
are not acknowledged and/or not propagated, then any resulting compiler
warnings are maintenance bugs. It is a bug that someone who compiles
the software should have to consider, "Is the compiler warning valid?
Should I ignore the compiler warning?"
I both agree and disagree with you. I agree that compiler warnings are
worrisome and they must be fixed or judiciously suppressed. I was looking
for ways to selectively suppress common gcc warnings *at specific
locations* in my code. Those warnings are correct but I did think them
through, and in some cases I won't change my code to dance around one
specific compiler's over-cautiousness. Caution is good, but there are
exceptions where the "uncommon" code is not the result of oversight on the
programmer's part.
I have no problem basing real C conditionals on constants in my code, if
the #if alternative appears messy. (I consider ./configure messy for the
caliber of projects I do in my "free" time, but that's personal taste.)
The absolute minimum requirement is a section in a README or
RELEASE-NOTES file which lists the compiler warnings and explains them.
It is much better to avoid the warnings entirely.
So that bit-rot has a chance to creep into the documentation, too.
Compiler warnings are not only compiler family dependent but also compiler
version dependent. If memory servers, there were big -W changes between
gcc-3 and gcc-4.
I don't program for a specific compiler. I program for SUSv2. However,
this mailing list is not about my stuff, so wrt. to plzip, you may be
right.
Turning to the actual code main.cc [with line numbers of plzip-0.1]:
1091 long max_worker = sysconf(_SC_THREAD_THREADS_MAX);
1092 if (-1L == max_worker) {
1093 max_worker = LONG_MAX;
1094 }
1095 if (UINT_MAX < (long unsigned)max_worker) {
1096 max_worker = UINT_MAX;
1097 }
1098 if (UINT_MAX / blf < (unsigned)max_worker) {
1099 max_worker = UINT_MAX / blf;
1100 }
1101 if ((size_t)-1 / sizeof(pthread_t) < (unsigned)max_worker) {
1102 max_worker = (size_t)-1 / sizeof(pthread_t);
1103 }
This part of code establishes an absolute upper limit on the number of
worker threads. ... The value of max_worker can
only decrease, starting with line 1095, and the safety of any
conditional itself is provided for by the earlier parts.
On the usual x86_64, which has 64-bit 'long' and 32-bit 'unsigned' and
64-bit 'size_t', then the test on line 1101 always is false, as warned
by gcc-4.4.1 and others. The left-hand side "(size_t)-1 / sizeof(pthread_t)"
has the compile-time constant value 0x1ffffffffffffffful, and the maximum
value
of any 'unsigned' on the right-hand side is 0xffffffffu. The test never
can succeed, regardless of the runtime value of max_worker, and this fact
is known at compile time and at programming time.
Yes!
Therefore lines 1101-1103 should be removed by conditional compilation
in the usual x86_64 environment. Not doing so is a maintenance bug
because common compilers warn, and dealing with the warning wastes
people time.
Common compilers may warn, and usefully so, and dealing with superfluous
warnings definitely wastes people's time, but I will not complicate valid
code to shut up a compiler. Furthermore, a quality compiler, such as gcc,
will omit the code, so there's no runtime or size penalty. (Even if the
code is left in place, that's no problem either.) If there'll be a *local*
suppression #pragma or anything, I'll use it.
(Since we're already talking software maintenance or release management or
some such, please realize that I don't code free software for the masses
in the first place. I do try to help packagers and do my own
distro-specific packaging work as nicely as I can; but my free software
development is not another job for me, where I have to put up with user
idiosyncrasies. I strive for maximum portability, and once I achieve it
with what I consider cleanliness and simplicity, I refuse to muddle it up
in order to work around "maintenance bugs", which have nothing to do with
my valid code and everything to do with compiler pecularities. Making
those lines dependent on a preprocessor macro would require an external
utility to define that macro "in some cases", and that utility would have
to cover *all* cases allowed under SUSv2.)
I find it entirely possible that Antonio will heed your remarks, though.
I would welcome an explicit initialized constant such as
size_t const SIZE_T_MAX = ~(size_t)0;
size_t const SIZE_T_MAX(~(size_t)0); // perhaps better for C++
and then use SIZE_T_MAX as appropriate instead of "(size_t)-1".
I have programmed for some one's-complement machines, where
-1 ==> 0xff...ffE which is not the same as
~0 ==> 0xff...ffF .
(size_t)-1 is not defined as a bit pattern. From the C99 standard (I have
no C89 on myself, sorry):
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it is
unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.49)
49) The rules describe arithmetic on the mathematical value, not the
value of a given type of expression.
Suppose size_t, being an unsigned integral type, has range [0 ..
SIZE_T_MAX]. (We don't know SIZE_T_MAX by value.) If we convert the int
constant "-1" to size_t, the algorithm above describes -1 + 1 * (
SIZE_T_MAX + 1 ) = SIZE_T_MAX, which is range.
Iff C89 has different rules for this, then I'm horribly wrong and offer my
apology.
About the same is going on around plzip.cc:601.
No, this is worse. Included below are explicit numerical counterexamples
which demonstrate bugs in programming logic.
The source line is
601 SIZES(S2w_blk, plain, 2u * compr_lev[clidx].dict_size, split);
and the expanded macro begins ["gcc -E plzip.cc", pretty-printed]
X1 do {
X2 unsigned tmp; tmp = 2u * compr_lev[clidx].dict_size;
X3 if ((size_t)-1 < tmp) {
X4 show_error( "size_t overflow in sizeof_" "plain" "." ); fatal()
X5 }
With 64-bit 'size_t' and with 32-bit 'unsigned', then the test on line X3
is essentially the same as discussed above. Thus it should be removed
by conditional compilation in order to avoid a maintenance bug.
... Perhaps so in plzip; I repeat my above observations too.
On a machine with 32-bit 'size-t' and 32-bit 'unsigned' (such as usual i686),
then things are worse. The types of both operands to the comparison on line
X3 are effectively the same (namely: 'unsigned'), and it is impossible for
the runtime value any variable such as 'tmp', to exceed the maximum value
for that type. This is known at compile time and at programming time,
so again conditional compilation should remove the test.
That's the first problem.
Ditto.
Here's the second problem. If (0x80000000u <= compr_lev[clidx].dict_size)
then the computation of
unsigned tmp = 2u * compr_lev[clidx].dict_size;
suffers an overflow.
Sure thing. That's why I've verified all static const
compr_lev[clidx].dict_size values before writing down the code in llzip.
We discussed this with Antonio in private on 19-20 Nov 2009 (message
identifiers for his reference:
<address@hidden>
<address@hidden>
<address@hidden>). If dict_size
can be selected more freely, this formula must be definitely revised.
The value is *undefined*,
Not undefined, just wrong :) Unsigned overflow is well-defined, see above.
Testing for size-based resource exhaustion must avoid arithmetic
overflow.
I concur whole-heartedly.
One of the operands to the comparison should be a variable by itself;
all the constants should be folded into the other operand,
This constant folding was considered cursorily, but after reviewing the
expression step by step in the private mails mentioned above, with those
static constant values in mind, the more verbose expression was retained.
This holds for the multiplication / division below, too.
Notice that encapsulating this into a general macro often is tedious.
The SIZES macro is specifically not there to hide implementation details.
It's #defined and #undefined tightly around the client code, and only
serves to save me from typing up the same tedious code twice (computing
struct and array sizes after range checking) and unavoidably introducing
typos. I did play it through in my head with both call sites / arguments.
That is a good warning! Avoid such macros; they tend to cause problems.
Function-like macros are dangerous; I agree.
Careful readers may suggest that it is impossible for
(0x80000000u <= compr_lev[clidx].dict_size)
because of resource restrictions.
I would never do such a thing.
There are more problems.
607 SIZES(W2m_blk, compr, (4u + 1u + 1u)
608 + ((unsigned)split_arg.sizeof_plain * 9u + 7u) / 8u + (4u + 8u + 8u),
609 work);
Again, testing for size-based resource exhaustion must avoid arithmetic
overflow. In this case "((unsigned)split_arg.sizeof_plain * 9u + 7u) /
8u" is particularly suspect. If (.sizeof_plain > 0x1c71c71c) then the
multiplication by 9u suffers an overflow,
We've checked this too, step by step. If you select the highest value of
.dict_size in llzip-0.03, ie. 32u * 1024u * 1024u, split_arg.sizeof_plain
ends up at 64M (0x04000000). Then,
Date: Thu, 19 Nov 2009 23:30:17 +0100 (CET)
From: ERSEK Laszlo
To: Antonio Diaz Diaz
Subject: Re: Question: parallel lzip?
[...] Here's the plan:
1. I'll use the minilzip -1 .. -9 presets for dict sizes and match
lengths.
2. I'll choose double dict size for input block length (->
sizeof_plain). This seems to be at most 2 * 32MB = 64MB then
(binary MB).
3. I'll choose the following expression for compressed buffer
length (-> sizeof_compr) -- single member file:
3.a. header
3.b. 9 bits per byte, rounded up
3.c. trailer
as in
sizeof_compr = (4u + 1u + 1u) + (sizeof_plain * 9u + 7u) / 8u
+ (4u + 8u + 8u)
The actual source code form will be different, but that's the gist
of it. At no point during the evaluation of this expression is an
unsigned overflow possible (UINT_MAX is at least 2^32 - 1 in
SUSv2). Substituting 64MB for sizeof_plain:
(0x6u + 0x24000007u / 8u) + 0x14u == 0x480001Au == 75497498u
SIZES is not a generic, "exported" macro. It's "copy-paste by
preprocessor", which I consider valid preprocessor usage.
One new problem is the "bare" constants such as in "* 9u + 7u) / 8u",
"(4u + 1u + 1u)", and "(4u + 8u + 8u)". What is the source
of all those numerical values?
http://www.nongnu.org/lzip/manual/lzip_manual.html#File-Format
Please note that llzip was not a "final product". It was meant from the
beginning to be forked / taken over by Antonio. That's why there are no
nice defines for those values. If you've read at least once the file
format section of the lzip manual -- which is not a stretch to assume
since you're reading the source of a program creating lzip files -- then
you'll immediately see the header-body-trailer dissociation in the
formula. Constants weren't folded explicitly in order to support that
insight (even without symbolic names, yes).
In the definition of the SIZES macro itself:
590 < arg ## _arg . sizeof_ ## arr - (size_t)1) { \
595 + (arg ## _arg . sizeof_ ## arr - (size_t)1); \
the "(size_t)1" is really "sizeof(S2w_blk.plain)" or
"sizeof(W2m_blk.compr)".
Yes.
The commonality of the '1' in SIZES with the
'1' in "S2w_blk.plain[1]" or in "W2m_blk.compr[1]" has been lost.
Well, they are (C89-style) flexible array members, so relying on their
fixed size of 1 byte seemed natural. (If they had any other size, then
they would not be flexible array members, and then the whole discussed
code section would make no sense. Indeed, before options -1 .. -9 were
introduced in lbzip2 (from which llzip was forked), those array members
had correctly #define-d sizes, and there existed no SIZES() macro at all.)
Using a parametric sizeof expression would communicate intent more
clearly, truly, at the price of complicating the SIZES macro even more.
Additionally, with the current subtraction, it is easier to see that the
subtrahend, (size_t)1, cannot be greater than the minuend, "arg ## _arg .
sizeof_ ## arr". (As outrageous as it might seem, I do trust myself to set
something called "sizeof_*" to something positive, especially if it's set
three lines above.)
The definition of the SIZES macro has created a bad situation for
maintenance.
Positively. I suspect Antonio will replace it altogether, and not later
than introducing the "--dictionary-size=" parameter.
Please understand, though, that in my *hobbyist* free software activity, I
don't code for maintenance by other people above all. This may not be
nice, but that's how it is. I don't need those *second* eight hours spent
with hacking to be another (but unpaid) *job*. The llzip-0.03 code is
technically correct. Maintainability might not be stellar, but here we
are, discussing it in relation with plzip; and if Antonio (or anybody)
asked me for clarification on any part, I'd be more than happy to do my
best explaining.
In this case, careful analysis of these warnings from gcc reveals true bugs,
At worst, communication bugs from one programmer to a different one, which
in my activity are much less urgent than getting the damn code into
existence at all, with all-nighters. I feel justified to prioritize ease
of development on my part over ease of maintenance / comprehension by
others, in case those two goals appear to conflict.
particularly the arithmetic overflows in testing for resource exhaustion.
llzip-0.03 continues to feature no overflows that I know of.
-o-
Thank you for taking the time to dissect plzip, and consequently, parts of
llzip, and consequently, parts of lbzip2. Code review is boring and hard.
I'm sure users will benefit if you keep doing it, it may render plzip (and
if you care and succeed to convince me in some points, perhaps lbzip2)
better -- or at least more easily maintainable -- software.
I'm sorry for any grammatical and/or logic errors I'm certain I've
committed above; it's sorta late again.
Cheers,
lacos