bug-binutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug ld/22831] ld causes massive thrashing if object files are not fully


From: lkcl at lkcl dot net
Subject: [Bug ld/22831] ld causes massive thrashing if object files are not fully memory-resident: new algorithm needed
Date: Thu, 01 Mar 2018 03:54:15 +0000

https://sourceware.org/bugzilla/show_bug.cgi?id=22831

--- Comment #6 from Luke Kenneth Casson Leighton <lkcl at lkcl dot net> ---
(In reply to H.J. Lu from comment #5)

> Please read my suggestion again and follow it to the letter.

sorry, hjl, i appreciate you're busy so are providing extremely
short responses: please read again what i wrote.  i am *not* the
person installing or running this.  i am acting merely as a
*messenger* after seeing and experiencing reports from at least
FIVE separate teams over the past SIX years of increasingly
difficult build problems due to this increasingly-important bug.

i am NOT the person who will be running any of the suggestions
that you are giving (because my laptop will potentially be damaged
by doing so and i cannot risk that), i will be RELAYING the suggestions
to various people across the internet, making them AWARE that you
are willing to tackle this particular problem.

therefore i require and seek CLARITY on EXACTLY what it is
that i am going to tell people, BEFORE suggesting to them that
they come and look at this bugreport.

is there anything that is unreasonable about that?

if so, please let me know.

https://github.com/hjl-tools/binutils-gdb/commit/9999de060bbcc7cca9dce213deeeec6593887a8e
 

ok so after re-reading twice, i eventually spotted the (misordered)
branch name.  can i suggest in future, rather than refer to the
main branch, to instead post people the link *directly* to the
branch, like this?

https://github.com/hjl-tools/binutils-gdb/tree/users/hjl/pr18028

it was a simple mistake, much more helpful to say "you missed that
i suggested trying a branch named xyz".

now, i took a quick look, and there is an assumption in the
patch, that the problem will *exclusively* occur on 32-bit systems.

this is not the case.

there are actually *two* inter-related problems.

the FIRST is that the amount of memory used for linking e.g. firefox
is so insane that it now requires 7 GIGABYTES of resident memory in
order to avoid thrashing... this is simply impossible to do on a
32-bit system.

the SECOND is that the linker phase GOES INTO THRASHING IN THE FIRST
PLACE and has done for many many years now INCLUDING on 64-BIT SYSTEMS.

if you read the original bug-report you will see that i said that
one 64-bit x86 laptop that i had, 6 years ago, only had 2GB of RAM.

the one that i have now has *16* GB of RAM but because it is an NVMe
SSD and an ultra-expensive laptop (USD $2500) i cannot risk the NVMe
drive getting damaged so swap is *DISABLED*.  despite this, it still
goes into total meltdown (loadavg over 100) whenever memory usage
approaches 16GB.

for both these systems - both of them *64-bit* systems *NOT* 32-bit
systems - going into swap-space is an absolute unmitigated disaster,
but this is now considered to be NORMAL that a build should go from
taking about 1 hour to link if it is below the 100% resident memory
usage threshold to taking SEVERAL DAYS in some cases if it goes even
the TINIEST FRACTION above the available resident memory...
because distros *do not have any choice in the matter*.

this is why i suggested the algorithm above, because the algorithm
above was part of an exercise set by extremely competent lecturers
at Imperial College University during an era where available memory
was a tiny fraction of what it is now, and running in virtual memory
was simply flat-out inconceivable because most systems were still
16-bit let alone 32-bit.

so.

questions:

1) would the proposed patch - which reduces virtual memory usage for
32-bit systems to half that of the available memory - *actually*
fix the problem as described on a *64-bit* system?

2) what would happen if *more than half* of the available virtual
memory is taken up by programs that happen to be running at the
same time as the linker phase?  consider the cases where, in
complex builds, there may be a REALLY LARGE chain of applications
that have spawned any given usage of the ld executable, such that
the expectation that there will *be* half of the total amount of
virtual memory space even *available* is not actually true.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]