[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Libunwind-devel] Testing time
From: |
Lassi Tuura |
Subject: |
Re: [Libunwind-devel] Testing time |
Date: |
Fri, 1 Apr 2011 11:57:14 +0200 |
Hi,
> I also applied two of your patches. 01-performance-optimisations.patch
> is the only remaining patch in my inbox.
Thanks!
I refreshed the performance optimisation patch to reflect the current git tree;
attached at the end.
There was also the trace-specific getcontext patch in another thread ("Another
optimisation for x86-64 fast trace"). I've attached it here again, for
completeness.
In the mean time I've experimented with yet another change to the fast trace,
this time replacing the two-level hash table with a single-level one. This
doesn't use mempool but instead grabs a slab of memory use GET_MEMORY (= mmap).
Apart from the app-provided memory allocation hooks it's a bit more like what
Jason Evans submitted.
This last one yields a fairly consistent ~5% improvement. I'm seeing this
taking our application (387M stack walks to profile memory allocations) to
50-51 clock cycles per stack level (was ~53), or ~1320 cycles per walk (was
~1400). The gain is largely from avoiding the double indirect memory access in
the hash code.
I hope the last improvement is still acceptable even with using less of the
mempool stuff.
I experimented with a number of other changes, none of which yield consistent
measurable improvement. I think with these changes the code is starting to be
at the limit of what can be achieved - or at least what I am capable of :-) Per
stack level time now mostly goes into a single hash table access plus reading
the saved register values off the stack; everything else contributes little to
the stack walk time (under heavy walking).
Regards,
Lassi
01-performance-optimisations.patch
Description: Binary data
02-getcontext-lite.patch
Description: Binary data
03-single-level-hash.patch
Description: Binary data