Re: [lmi] More Clang fixes

lmi
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] More Clang fixes

From:	Greg Chicares
Subject:	Re: [lmi] More Clang fixes
Date:	Mon, 28 Mar 2016 14:41:32 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0
On 2016-03-27 21:24, Vadim Zeitlin wrote:
> On Sun, 27 Mar 2016 10:19:47 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> Reexamining this later, 'database_view_editor.cpp' has 'int':
> GC>   class DatabaseDurationAxis
> GC>     :public AdjustableMaxBoundAxis<int>
> GC> where, 'tier_view_editor.hpp' has 'unsigned int':
> GC>   class TierBandAxis
> GC>     :public AdjustableMaxBoundAxis<unsigned int>
> GC> and I suppose that's the underlying design flaw: it should always
> GC> be a signed type. Furthermore, it should always be (signed) 'int',
> GC> so using a template parameter was a mistake. It still seems more
> GC> cautious to commit the more local change above than to redesign it.
> 
> <digression-on-the-use-of-unsigned-you-should-feel-free-to-skip>
> 
>  I agree with this but I also have to say that I'm not really sure that
> using "unsigned" here was a design flaw. Lakos gives, of course, a lot of
> very good advices, but the one about not using "unsigned" has never seemed
> convincing enough to me. Trusting the summary of his arguments at
> http://groups.google.com/group/comp.lang.c++.moderated/msg/849321a0199501e5
> (taken from a comment in multidimgrid_tools.hpp),

I had forgotten about that old comment until you pointed it out just now.
I arrived at the same conclusions independently yesterday.

> they're rather simple to
> refute:
> 
>> 1. Using unsigned does not ensure at compile time that negative numbers
>> will not be passed at runtime ...
> 
>  Of course not, this is a well-known problem of C++ inherited from C:
> conventions between numeric types are implicit. There is nothing particular
> to unsigned here, ints are also convertible to doubles but this doesn't
> mean that we should stop using ints and use doubles evewyehre.
> 
>> 1. ... Sure you get warnings for "strlen(myStr) < myInt", but not for int
>> x = -1; f(x);
> 
>  The warning is useful because it's simple to run into it accidentally. A
> warning for f(-1) might be useful too, but even if we could have it, it
> would be much less important in practice because I've never seen this
> happen accidentally. As usual, the idea is to protect against Murphy, not
> against Machiavelli.
> 
>> 2. Using unsigned does not allow for the negative value checking without
>> first converting back to signed.
> 
>  This is neither here nor there because unsigned is only used for values
> which are never negative, so you never need to check for this and thus it's
> irrelevant that you can't. Equivalently, if you need to check whether a
> value is negative, do use signed for it, of course.

In this case (citing the code before yesterday's changes):
  unsigned int const sel = GetSelection();
  if(!(0 <= sel && sel < axis_.GetCardinality())) // ... diagnosed error
I think this is a real concern. It's not immediately obvious whether
GetSelection() or GetCardinality() can return only nonnegative values.
I looked them both up yesterday, yet today I remember that one could and the
other couldn't, but not which is which. Glancing at the code cited above:
  unsigned int const sel = GetSelection();
I'd guess that GetSelection() was the one that always returns nonnegative,
but that guess is wrong because, looking at the declarations again, I see:
  int wxChoice::GetSelection();
  virtual unsigned int GetCardinality() const = 0;
Thus, the cited line
  unsigned int const sel = GetSelection();
has misled me twice in two days, which I have to consider a Bad Thing--and
I attribute it to mixing signed and unsigned return types.

GetSelection() and GetCardinality() cannot meaningfully return a negative
selection or a negative cardinality.

The concepts Cardinality and CurrentSelection cannot truly be negative:
e.g., the value -9 cannot mean that a set had -9 members or that item
number -9 is selected. We need signed int not to represent such impossible
things, but only to allow special values indicating exceptional conditions:
that no item is selected, or perhaps that a set hasn't been created and
therefore doesn't yet possess cardinality. For those special purposes, we
generally want the union of a limited set of nonnegative "normal" values
and one special value like -1 or wxNOT_FOUND. Given that wxNOT_FOUND is
sensible and useful in many cases, I contend that signed int should be
used in almost all cases--because it's not clear from semantics whether a
function like GetCardinality() might return -1 to indicate some kind of
error, and because if we use only signed int in interfaces, discussions
like this (and mistakes like the one addressed yesterday) are avoided.

>> 3. Using unsigned only increases the size of an integer by 1 bit.
> 
>  Completely irrelevant, nobody uses unsigned because of this.

Allocation functions use unsigned arguments exactly because of this.
Otherwise, an array of size 2^16-1 couldn't be allocated on a 16-bit
machine. That's why size_t is unsigned; and therefore functions like
size() are, as well.

But for APIs that return, say, the current selection in a wxChoice,
this is no reason to prefer unsigned: even on a 16-bit machine, we
won't offer 2^16 selections in a wxChoice, or even half that number.

>> 4. Lakos considers that using unsigned increases the chance of error
> 
>  Any studies or empirical observations that Lakos could have based this
> conclusion on must have necessarily been done 30+ years ago.

BTW, after a long hiatus, he has a new book that will be published soon.

> Compilers have
> changed a lot since then, in particular no compiler gave warnings about
> mixing signed and unsigned back then. I feel that these warnings reduce the
> chance of an error and I'd really like to have any proof that using
> unsigned is deleterious to the code quality in this millennium.

We're paying attention to this now because we happen to be using Clang
and it identified a not-very-sensible test that remains in production
today.

We could have identified the same issue with gcc. We didn't, only because
we had its applicable warning turned off. We had turned it off because it
had been observed to issue too many errors when compiling some earlier
version of wx with some earlier version of gcc. When you questioned that
yesterday, I enabled gcc's warning and found that it identified the same
issues as Clang.

Aside from those mechanical considerations, comparing GetCardinality()
and GetSelection() is innately difficult because of their differing
return types. Tools that identify mixed comparisons can show us where we
need to expend extra thought, but cannot save us from that expenditure.
Use signed integers in the interface as a general rule avoids these
problems and lets us write simpler code, faster.

>> 5. Using an unsigned interferes with function overloading.
[...]
>> 6. Using unsigned can interfere with template instantiation

I don't think these are very important points.

>  So, on balance, I only see one tiny argument not to use unsigned. OTOH
> there are two big arguments in favour of using it, of very different
> varieties:
> 
> 1. Theoretic one: it provides more semantic information which is always
>    a good thing. It also reduces cognitive dissonance which happens when
>    something that obviously can't be negative is represented by a signed
>    type.

Yet consider GetCardinality() and GetSelection() again from this POV.
For both, any "normal" value must be nonnegative: both have unsigned
semantics in that respect. GetSelection() returns signed int only so that
it can return wxNOT_FOUND, a deliberately unnatural value that cannot
identify any actual physical selection: a meta-value indicating instead
that there is no selection. I could conceive of a GetCardinality()
function that returns -1 to indicate an error; Evgeniy found no practical
need for that, so he returned unsigned, but that's incidental.

For me, cognitive dissonance arises from attempting to compare two things,
neither of which can obviously be negative, and stumbling into compiler
warnings because they aren't of straightforwardly comparable types.

> 2. Practical one: unsigned type (size_t) is used everywhere in the standard
>    library and if the rest of the code uses signed types, it requires
>    converting, implicitly (dangerous; results in warnings) or explicitly
>    (annoying; can still be dangerous) back and forth between them.

size_t is unsigned for the valid reason given above. Perhaps we could make
an argument for a general rule that's the dual of Lakos's--avoid signed
integers in the interface, in which case wxNOT_FOUND would be UINT_MAX
rather than -1--but that seems too unnatural. Otherwise, we just have to
live with this nuisance; but it's manageable, e.g., by using:

  int size_t_to_int(size_t)

which would throw if the return value would be negative (but it probably
make more sense just to be mindful of this and write a static_cast
whenever we need to compare to a size_t).

> </digression-on-the-use-of-unsigned-you-should-feel-free-to-skip>
> 
> GC> 'multidimgrid_any.cpp':
> GC> 
> GC> -    unsigned int const sel = GetSelection();
> GC> -    if(!(0 <= sel && sel < axis_.GetCardinality()))
> GC> +    auto const sel_orig = GetSelection();
> GC> +    if(sel_orig < 0)
> GC> +        {
> GC> +        fatal_error()
> GC> +            << "Choice control has unexpectedly invalid selection."
> GC> +            << LMI_FLUSH
> GC> +            ;
> GC> +        }
> GC> +
> GC> +    unsigned int const sel = static_cast<unsigned int>(sel_orig);
> GC> +    if(sel >= axis_.GetCardinality())
> GC> 
> GC> I think this calls
> GC>   int wxChoice::GetSelection()
> GC> so the fundamental mistake was assigning it to an unsigned local variable.
> 
>  FWIW in my ideal UI library API this function wouldn't return neither
> unsigned nor signed int but rather optional<unsigned> value from here.

Another digression: APL would use an "empty vector" here, with much the same
meaning as your ideal. That's built into the language: in that respect it's
more like nullptr than a default-constructed std::vector<> with zero elements.

In the C family, it's customary to use "impossible" negative values to
represent errors: thus, a function that writes a number of bytes to a file
might return -1 to indicate that the file could not be opened. Because of
that, at least some functions must return signed int even if their "normal"
range of return values is constrained to be nonnegative. Now what return
type are we to choose for naturally-nonnegative functions for which we
don't (yet) see any use for "special" values indicating error? Two options:

- Return unsigned when we can, and signed when we know that we'll want to
represent errors in "special" values--even though the function is semantically
nonnegative. In that case, we have to look up the return values to distinguish
these two possibilities.

- Lakos's option: always return signed for simplicity and compatibility.

> GC> Suppose we remove that problem directly:
> GC> 
> GC> - unsigned int const sel = GetSelection();
> GC> + int const sel = GetSelection();
> GC> 
> GC> Then we're still comparing signed and unsigned here:
> GC>      if(!(0 <= sel && sel < axis_.GetCardinality()))
> GC>                           ^
> GC> because of this:
> GC>     virtual unsigned int GetCardinality() const = 0;
> GC> (which is why Lakos recommended avoiding 'unsigned' in APIs--and that
> GC> applies to the 'tier_view_editor.cpp' case above, too). We're not
> GC> going to redesign the API, so this workaround seems best:
> GC> -    if(!(0 <= sel && sel < axis_.GetCardinality()))
> GC> +    if(!(0 <= sel && sel < static_cast<int>(axis_.GetCardinality())))
> 
>  It doesn't matter, of course,

It matters to me. Here's what I was thinking: GetCardinality() cannot "properly"
return any "large" value in (INT_MAX, UINT_MAX], so the static_cast above
"should" be safe. This is not rigorous.

> but I find it strange that you prefer to
> write it like this rather than
> 
>          if(!(0 <= sel && static_cast<unsigned>(sel) < 
> axis_.GetCardinality()))
> 
> After all -- and even if this is completely theoretic -- the check in the
> first branch of "&&" clearly ensures that my cast is correct, while the
> cast above might not be necessarily so.

That is rigorous, hence preferable.

> GC> Anyway, I'm going to commit these changes as described above so that I
> GC> can move ahead to enhancing the gcc warnings, which requires building
> GC> everything from scratch with every 'build_type'.
> 
>  Thanks for committing this, this is what counts the most, of course! But
> I'd still like to defend the use of unsigned... BTW, the same comment
> speaking about removing it refers to "lmi coding standard section 16.13"
> but I'm not sure where can I find the lmi coding standard, could you please
> point me to it?

IIRC, there was some potential copyright issue that prevented me from
publishing it, but this section has no such issue so I'll quote it here
(and send you the whole thing privately):

<p>
16.13 Generally avoid <tt>unsigned int</tt> where unintended
modulo arithmetic or implicit conversion to a signed type might
occur--especially in the interface, and even in the implementation.
See sections 9.2.2 and 10.1.2 of Lakos,
<tt>Large-scale C++ Software Design</tt>.
Mixed-mode arithmetic is brittle and leads to subtle defects.
Also consider:
</p>

<pre>
  unsigned int a = 2;
  unsigned int b = 3;
  // Incorrectly assume that b < a, so that c cannot be negative.
  unsigned int c = a - b;
  new char[c];
  // Oops--allocate all addressable memory less one byte, or at least try.
</pre>

<p>
Here, <tt>c</tt> is expected to have a nonnegative and reasonably small
value. But declaring it as unsigned forces it to be nonnegative by
making it large and unreasonable. That example behaves as though it
were written
</p>

<pre>
  short int a = 2;
  short int b = 3;
  long int c = a - b;
  if(c < 0)
      {
      c += std::numeric_limits&lt;short int&gt;::max();
      }
  // ...
</pre>

<p>
on a machine where <tt>long int</tt> is wider then <tt>short int</tt>.
The unsigned type is inappropriate if you really mean
</p>

<pre>
  // ...
  if(c < 0)
      {
      // This is 'impossible', or so we hope.
      throw std::logic_error("The 'impossible' has occurred.");
      }
  // ...
</pre>


<p>
so use signed integers and assert the postcondition.
</p>

<p>
Unsigned types are appropriate in certain other cases: bitmasks,
for example. And &lt;cctype&gt; contains functions whose domain is
restricted to the union of EOF and values representable as
<tt>unsigned char</tt>. Just don't use unsigned types to mean
"I hope this won't be negative".
</p>
[Prev in Thread]
Current Thread
[Next in Thread]
[lmi] More Clang fixes, Greg Chicares, 2016/03/27
- Re: [lmi] More Clang fixes, Vadim Zeitlin, 2016/03/27
  - Re: [lmi] More Clang fixes, Greg Chicares <=
    - Re: [lmi] Using unnsigned (was: More Clang fixes), Vadim Zeitlin, 2016/03/28
    - Re: [lmi] Using unnsigned, Greg Chicares, 2016/03/28
    - Re: [lmi] Using unnsigned, Vadim Zeitlin, 2016/03/28
Prev by Date: Re: [lmi] Numerics
Next by Date: Re: [lmi] Using unnsigned (was: More Clang fixes)
Previous by thread: Re: [lmi] More Clang fixes
Next by thread: Re: [lmi] Using unnsigned (was: More Clang fixes)
Index(es):
- Date
- Thread