lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] [lmi-commits] master a3a294f 4/7: Pass modifiable arguments by


From: Vadim Zeitlin
Subject: Re: [lmi] [lmi-commits] master a3a294f 4/7: Pass modifiable arguments by reference
Date: Wed, 16 May 2018 02:16:13 +0200

On Tue, 15 May 2018 23:03:13 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2018-05-14 14:17, Vadim Zeitlin wrote:
GC> > On Sun, 13 May 2018 19:45:54 -0400 (EDT) Greg Chicares <address@hidden> 
wrote:
GC> > 
GC> > GC> branch: master
GC> > GC> commit a3a294f211f255fd0fc6de7be44ca7f3b339e7eb
GC> > GC> Author: Gregory W. Chicares <address@hidden>
GC> > GC> Commit: Gregory W. Chicares <address@hidden>
GC> > GC> 
GC> > GC>     Pass modifiable arguments by reference
GC> > GC>     
GC> > GC>     Indirection impedes readability.
GC> > 
GC> >  Sorry, but I have to strongly disagree with this. On the contrary,
GC> > indirection is very helpful to see and hiding it serious impedes
GC> > comprehension of the code.
GC> 
GC> Isn't that just an old C convention that's outmoded in C++?

 I don't think so. I also have a feeling that some people automatically
consider that if there are 2 ways to do things in C++, one inherited from C
and another which only exists in C++, then the latter is always preferable.
This is, of course, true in 99% of cases, but there are some exceptions and
the use of pointers for output parameters is one of them.

GC> As a side effect, it was obvious at the call site whether a parameter
GC> could be modified or not. That may have seemed handy, but it wasn't
GC> actually useful

 I'd like to add "[citation needed]" here. IMO it's one of the things that
make C code generally more clear (this is, of course, compensated by plenty
other things, e.g. equivalent C code is usually much longer than C++ code
and uses macros more widely etc) and contribute to the reputation of C++ as
being difficult to read.

GC> --if it were, someone would suggest a cleaner way of
GC> expressing it in C++, like writing 'const' at the call site:
GC>   void foo(int);
GC>   int x = 3;
GC>   foo(x const); // this would be cleaner
GC> or perhaps annotating the opposite case:
GC>   void bar(int&);
GC>   bar(x mutable); // indicate modifiability at call site

 What do you say of C# doing exactly this? AFAIK nobody has ever complained
of having to write "ref" and/or "out" in C# code, which proves that it was
a good change. You won't be surprised, knowing my high regard for it, that
Rust does it too, with "&" and "&mut".

GC> Indirection is a completely different concept; using it to indicate
GC> non-const-ness just seems unnatural to me in C++.

 It's not non-const-ness, it's the difference between pass-by-value and
pass-by-reference semantics (it doesn't help that passing by const
reference in C++ is just an optimization of pass-by-value and not a real
pass-by-reference). And it's pretty important.


GC> Stroustrup & Sutter say:
GC> 
GC>   https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-inout
GC> | F.17: For “in-out” parameters, pass by reference to non-const
GC> 
GC> Also see:
GC> 
GC>   https://isocpp.org/wiki/faq/references#pointers-and-references
GC> | Old line C programmers sometimes don’t like references since they
GC> | provide reference semantics that isn’t explicit in the caller’s code.
GC> | After some C++ experience, however, one quickly realizes this is a
GC> | form of information hiding, which is an asset rather than a liability.

 This is an appeal to authority combined with ad hominem and I don't
find either very convincing.


GC> > GC> @@ -1728,7 +1728,7 @@ class page_with_tabular_report
GC> > GC>          render_or_measure_extra_headers
GC> > GC>              (table_gen
GC> > GC>              ,interpolate_html
GC> > GC> -            ,&pos_y
GC> > GC> +            ,pos_y
GC> > GC>              ,output_mode
GC> > GC>              );
GC> > 
GC> >  Seeing a call like this in the code it's now not at all obvious that
GC> > "pos_y" is modified as the result of calling this function, whereas this
GC> > was quite clear before.
GC> 
GC> Neither is it obvious--at the call site--how the other parameters are
GC> passed (in three different ways), or which may be modified:
GC>  - in-out: table_gen        is passed by reference

 This is the only problematic one and, admittedly, it should be passed by
pointer according to the general principle. It isn't because the table
generator internal state really isn't important anyhow and I think that
pass-by-value vs pass-by-reference difference mostly makes sense for
objects having value semantics in the first place.

GC>  - in    : interpolate_html is passed by const reference
GC>  - in    : output_mode      is passed by value

 Both of these are just fine: they're passed by value, semantically
speaking, and can't be modified by the function.

GC> or whether the 'this' pointer may be modified.

 It would be very unexpected (and bad, IMO) if this method modified the
object itself. We can't do anything to show that this is not the case at
the call site but this is not a reason to do something for the parameters
for which we can.

GC> But why should we care? If render_or_measure_extra_headers() modifies
GC> 'pos_y', so what? Writing a set of headers ought to move the position
GC> where the next write will occur. That's no surprise.

 It can be a surprise because it's not necessarily immediately clear that
the function moves the position. When you read the code, you have a mental
map of its structure, including the variables it contains and seeing a "&"
here is very useful to indicate that the value of the variable is changed
and you can no longer rely on what it was before. It's much harder to track
the variable values without it.

 Unfortunately I don't have any formal studies proving it, but IME keeping
following what happens with different variables is the most complicated
part of trying to understand unfamiliar code (at local level, I'm not
speaking of the bigger picture). Anything that helps with this should be
welcome and I can guarantee that for me personally seeing the "&" helps a
lot.

 BTW, this is also why I insist on making all local variables whose values
are never changed "const" (and I love that this is the default in Rust and
you need to explicitly write "mut" to indicate that the variable value will
change). And it's, of course, also the reason behind the usual
recommendation of initializing the variables at the point of declaration
(or, equivalently, only declaring the variables when they're initialized)
in C++, instead of doing it at the top of the scope as in C. All these
guidelines have the same goal and it just doesn't make much sense to follow
some but not all of them.

GC> > The "&" here is not just some "syntactic noise", it is valuable
GC> > because it indicates that the parameter is modified. I don't see what
GC> > do we gain from removing it, but we definitely lose a lot in code
GC> > clarity.
GC> 
GC> I see it in exactly the opposite way. I don't read '&' as suggesting that
GC> the variable passed as an argument is to be modified

 But you should. There is no other reason to use it.

GC> When I see foo(a, &b, *c), my first thought is that 'b' and 'c' have weird
GC> levels of indirection in order to make them work with some legacy API.

 There is no C API being used in lmi, so this just can never happen in lmi
code. The "*c" case is completely different, too, as "c" is a local pointer
and has nothing to do with the API. And "&b" is a clear sign that "b" is an
out parameter, because what else could it be?

GC> And that makes the code harder for me to understand or modify, because I
GC> have to keep thinking about whether 'pos_y' is the vertical position, or
GC> its address in memory:
GC>  - whether I can use it as the vertical position, or must dereference it;
GC>  - whether I can pass it directly, or must take its address.

 Sorry, but this is just not a problem. I can't believe you often make
mistakes confusing pointers with values, but even if you do, they're
trivially caught by the compiler. And you're speaking about writing the
code here, while we both know that reading the code is much more important
than writing it as an order of magnitude more time is spent on the latter
than on the former.

GC> If it can be either the value or its address, depending on context, then 
GC> code that uses 'pos_y' in similar ways has to be written dissimilarly,
GC> making comprehension needlessly hard. And of course a pointer might be null,
GC> so that's one more thing that can go wrong.
GC> 
GC> Thus, in the original `git show a3a294f^:ledger_pdf_generator_wx.cpp`:
GC> 
GC>         table_gen.output_super_header
GC>             ("Non-Guaranteed Values", column_mid_account_value, column_max
GC>             ,&pos_y, output_mode);
GC> [ampersand   ^ here]
GC> 
GC>         pos_y += table_gen.get_separator_line_height();
GC>         table_gen.output_horz_separator
GC>             (column_guar_account_value, column_separator_guar_non_guar
GC>             ,pos_y, output_mode);
GC> [no ampersand ^]
GC> 
GC> I didn't perceive a deliberate distinction between a modifiable 'pos_y' and
GC> an unmodifiable one. I just saw a gratuitous inconsistency: an accidental
GC> wrinkle that needs to be ironed out in order to make everything easy to 
read.

 Sorry again, but this was a mistake. This distinction was indeed
deliberate, and important, and its absence will hurt when returning to this
code later.

GC> >  IMO output parameters should be passed by pointer or not at all.
GC> 
GC> Consider this example:
GC> 
GC>   #include <cstring>
GC>   char a[100] = "Hello, ";
GC>   char b[100] = "world!";
GC>   std::strcat(a, b);
GC> 
GC> Both parameters are passed as pointers, but you can't tell which of them
GC> is modified, unless of course you consult the declaration.

 This is because representing strings as char arrays is a horrible thing,
in particular because char arrays do not have value-like semantics that
strings obviously should have. When I tried justified using some parts
inherited from C above, I definitely did _not_ mean this one.

 With a similar (still poor, but acceptable) C++ API the above would look
like

        #include <string>
        std::string a{"Hello, "};
        std::string b{"world!"};
        string_concat(&a, b);

Please tell me that "&" doesn't help you here by making it immediately
obvious which variable receives the result of the operation (and that the
result is returned in one of the arguments instead of via the function
return value).

GC> Furthermore:
GC> 
GC>   char& rb = b[0];
GC>   char c[100] = "Goodbye, ";
GC>   std::strcat(c, &rb);
GC> 
GC> writing '&' at the call site doesn't necessarily make it clearer, because
GC> the parameter passed with '&' isn't the one that's modified. IOW, the
GC> '&-means-modifiable' convention isn't respected by the language, and can
GC> even be contradicted, because indirection and constness are entirely
GC> different concepts.

 This is just a very bad example due to its use of arrays. Please let's
bury this part of C++ C inheritance together and never talk about it again.

 And, of course, if you rewrite the example above using any value-like type
(int, double, std::string, ...), there is no problem here any more.

GC> > I.e. if
GC> > you absolutely don't want to revert this (and the next) commit(s), which I
GC> > think would be the best way to do, the function needs to be changed 
further
GC> > to return the new value of pos_y and take the argument by value instead of
GC> > modifying it. This would make things heavier than needed, but at least 
they
GC> > would (still) be clear.
GC> 
GC> But that would be inconsistent: in the render_or_measure_extra_headers()
GC> case, the original declaration
GC> 
GC>     virtual void render_or_measure_extra_headers
GC>         (illustration_table_generator& table_gen
GC>         ,html_interpolator const&      interpolate_html
GC>         ,int*                          pos_y
GC>         ,oenum_render_or_only_measure  output_mode
GC>         ) const
GC> 
GC> would have to be changed to pass 'table_gen' as a pointer, and the return
GC> type would have to be std::pair<illustration_table_generator,int>.

 Again, I think it's widely understood that the state of a "generator"
doesn't really matter. So I don't think it's objectionable to pass it
around by non-const reference. Its state is indeed modified
surreptitiously, but we don't care about it, so it's not a problem.


GC> > The current version is IMO the worst possible
GC> > approach.
GC> Consider:
GC>   (1) void foo(int& y);
GC>   (2) int  foo(int  y); // returns modified value of 'y'
GC>   (3) void foo(int* y);
GC> 
GC> I think you're saying your increasing preference order is
GC>   1 << 2 < 3

 Yes.

GC> but the isocpp guidelines cited above recommend the opposite:
GC>   (1) for in-out parameters (like our 'pos_y') [F.17]
GC>   (2) [without any argument] for out-only parameters [F.20]
GC> They never even mention (3) in their comprehensive treatment:
GC>   
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-conventional

 IMO it's a mistake. This is probably because the core guidelines aim to
eliminate the use of pointers as much as possible (see
https://github.com/isocpp/CppCoreGuidelines/issues/847#issuecomment-278790978)
but I think we can trust ourselves to use pointers in this situation
without becoming addicted to them.

GC> Making a variable a pointer doesn't indicate whether it's modifiable;
GC> it indicates that nullptr is a valid value.

 It indicates both, which is indeed not ideal, although in this particular
case I hope we can all agree that the probability of accidentally passing a
null pointer here is, well, null. If this is really a problem, we could use
not_null<> template to make it even more painfully clear:

https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#i12-declare-a-pointer-that-must-not-be-null-as-not_null

but IMO it's not worth it: explicitly passing nullptr as "position"
argument is just not something that really happens unintentionally.

GC> References are safer: they can't be uninitialized, can't be reseated,

 Neither of these advantages applies to function arguments.

GC> and, what is most important, can never be null. And they're easier to
GC> read: you don't need '&' to pass them or '*' to use them.

 No, they're not easier to read at the call site. They're impossible to
notice there, which is very far from being the same thing. It's like saying
that invisible ink is easier to read than normal text. Yes... but no.


GC> I've read through half a dozen of the articles found by this search:
GC>   
https://www.google.com/search?q=c%2B%2B+output+parameter+pointer+or+reference+site:stackoverflow.com
GC> and I really think that what you're advocating is not very popular.

 I've clicked a few of these links and all of them mention the guideline
that I use, with the same rationale, see e.g. this answer:

        https://stackoverflow.com/a/334944/15275

And while I definitely don't think Google C++ style guide is worth of
following, they recommend it too, so this is the style followed by the
company probably employing the highest number of C++ programmers in the
world. Not that popularity should be the sole criterion of choice
(otherwise we should have stopped using "foo const&" a long time ago as
less than 5% of C++ programmers do it and "const foo&" is overwhelmingly
more popular), but this is clearly not a niche convention that nobody has
ever heard about.

 Most importantly, I know from experience that lack of "&" at the call site
seriously hampers my comprehension of the code when reading it. Perhaps I'm
unique, but I have no reason to think so, and believe that anybody can
profit from the presence of "&" just as much as I can.

 You do need to get used to the fact that the presence of "&" in the
function argument list means that the argument is an in/out parameter
instead of just seeing it as random noise. But this is hardly a huge effort
to make, is it?

 Anyhow, I've stated my case as well as I could, I think, so I'll let you
take the decision, but personally I remain convinced that there are no good
arguments against the use of pointers for in/out parameters at all -- the
only one I can't fully debunk is the one of consistency, but this is one
time when consistency should be sacrificed at the altar of the greater
good.

 Regards,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]