bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #64421] [troff] link-time optimization changes `.m` register initia


From: G. Branden Robinson
Subject: [bug #64421] [troff] link-time optimization changes `.m` register initialization
Date: Fri, 8 Sep 2023 16:11:46 -0400 (EDT)

Follow-up Comment #35, bug #64421 (project groff):

[comment #34 comment #34:]
> I'd never seek to deny someone the pleasure of reading ELF x86-64 ABI
documentation.  But since groff should be able to compile and behave
deterministically in non-ELF-x86-64 environments, is the more relevant
question what the language itself guarantees about uninitialized variables?

Possibly, or that may be left unspecified, at which point you are indeed left
to the mercy of the language runtime and/or platform's ABIs.

Without `-flto=auto`, `default_color` is a global of type `color`.  It is not
initialized and therefore goes into the BSS section.  This means that there is
no storage reserved for it at compile time; instead, necessary storage is
allocated at process initialization using zeroed memory pages.  This is why
you can rely on global and static ints to be initialized to zero in C/C++.

`default_color` isn't an `int` but that doesn't matter much.  It still--in the
absence of aggressive optimization--gets a bunch of zeroes backing it up with
as much storage as the data type requires (which one can compute at compile
time with the `sizeof` operator).  Structs can be nested, so if one struct
contains another, the size will be the sum of the combination.

The compiler understands the structure of structs.  (A C++ compiler also
understands classes.)  In C++ structs and classes are are similar to C
structs, except they (1) add a system of rules about field visibility
(`public`, `private`, `protected`); (2) support construction and destruction;
and (3) C's syntactical structure "tags" are injected into the type name space
automatically in C++ as bona fine type names.  This knowledge of the structure
of structs is how they get translated into assembly language; given the
address a variable of a struct type (or record type in Pascal), the address of
any field within that struct can be computed by summing the sizes of all the
fields preceding the desired field.  (Plus any extra offsets required to
account for padding required to align fetches from memory, which usually
_isn't_ specified by programming language rules, but by ABIs.  This was a
realization late in coming  to C; it didn't enter the language until ANSI
standardization in 1989.)

So here is what I think is happening.  The _color.h_ and _symbol.h_ header
files are both important to know about.  (I clumsily pasted a URL in comment
#33, skipping one of these.)

https://git.savannah.gnu.org/cgit/groff.git/tree/src/include/color.h?h=1.23.0#n36

https://git.savannah.gnu.org/cgit/groff.git/tree/src/include/symbol.h?h=1.23.0#n24

We also should have a look at the global objects `default_color` and
`default_symbol`.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/libs/libgroff/color.cpp?h=1.23.0#n398

https://git.savannah.gnu.org/cgit/groff.git/tree/src/libs/libgroff/symbol.cpp?h=1.23.0#n157

As I noted in comment #33, `default_symbol` is a global variable (or object)
with a constructor, meaning that it runs code before `main` starts. 
`default_color` is a global variable (or object) *without* a constructor.

*BUT*...an object of type `color` always *contains* an object of type
`symbol`.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/include/color.h?h=1.23.0#n36

Once again, this is a bare declaration with no initializer.  So, when an
object of type `color` is eventually created, the `symbol` it contains is
created as well.  Since there is no initializer, the default constructor is
used (not like it matters much here, there being no alternative constructors
taking arguments).

https://git.savannah.gnu.org/cgit/groff.git/tree/src/include/symbol.h?h=1.23.0#n46

This constructor looks like it does _nothing_, but it does have a field
initializer list, a bit of C++ syntax.  `s(0)` is the part of interest here. 
It sets the private member variable `s` (a C string a.k.a. pointer to char) to
a null pointer.

I also see that the constructor is inlined, meaning that its code should be
duplicated at the point of call instead of producing an "activation record"
i.e., pushing a frame onto the call stack.

There is another constructor, but it is only declared, not defined.  It takes
a parameter (another C string) and and _optional_ parameter "how".

https://git.savannah.gnu.org/cgit/groff.git/tree/src/include/symbol.h?h=1.23.0#n32

I don't know C++'s rules well enough to say if that's valid, but I do know
that groff builds even with Bjarni's massive list of warning flags doesn't
squawk about it. 

But somewhere around here appears to be where the trouble arises.  Somewhere,
link-time optimization is getting the idea that the `default_color` object,
once eventually constructed--and to be honest, my code greps are not telling
me that it ever actually is--is not getting a bespoke `symbol` object, but is
reusing the existing global `default_symbol`, which *is* constructed using the
parameterized, declaration-only constructor noted in the previous link.  I do
note that the optimizer's guess that `default_symbol` was the right choice for
a late-initialized `default_color` to contain does seem too clever by half,
and for us was, in fact, the wrong choice, and broke any logic that followed
the documented behavior of the _groff_ language's `.m` and `.M` registers.

I cannot say if `-flto=auto` exposed a bug in _groff_, a bug in this LTO
implementation, or if together they encountered unspecified C++ language
behavior about un-constructed global objects of struct or class types that we
just have to suck up and work around with more explicitness and clarity, which
is what my pending patch does.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64421>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]