[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Libunwind-devel] Another optimisation for x86-64 fast trace
From: |
Lassi Tuura |
Subject: |
[Libunwind-devel] Another optimisation for x86-64 fast trace |
Date: |
Tue, 29 Mar 2011 19:39:08 +0200 |
Hi,
Here's one more small performance patch for x86-64 fast trace: a slightly
lighter getcontext. It reduces the number of clock cycles in getcontext by
about factor of 6, which gives a small consistent (~3-5%, varies by run)
improvement on the entire backtrace().
With this the cycles per stack walk and level (average/rms) has evolved like
this:
- my original patches:
per walk: 1502 / 2570; per level 55.0 / 222.4
total run time: 1088 s (but slower profiler core from other results below)
- after merge + patch to separate unw_backtrace() symbol:
per walk: 2373 / 2886; per level 90.2 / 204.7
total run time: 1130s
- + other patches I sent before:
per walk: 1500 / 2595; per level 57.1 / 200.9
total run time: 969s
- + this patch:
per walk: 1445 / 2604; per level 55.0 / 208.7
(actually some runs were better than this by a few % points,
but the full application run time was more so I used this)
total run time: 959s
In case you are interested, I put on web CPU cycle views of the source code and
assembler, before (a) and after (b) this change. The before is also before my
unw_tdep_trace -> tdep_trace changes posted today, and the after one has
unrelated optimisations in e.g. "dumpOneProfile" which do not affect libunwind
performance figures.
(a) http://cern.ch/lat/cmssw/ptuview/igopt-d5-10ev/all/basic_sampling
(b) http://cern.ch/lat/cmssw/ptuview/igopt-e1-10ev/all/basic_sampling
BTW, it occurs to me the tdep_trace off-by-one could be solved by giving one
more stack level in getcontext_trace. Then tdep_trace wouldn't need the 'if' to
protect filling 'address' array.
Regards,
Lassi
01-getcontext-lite.patch
Description: Binary data
- [Libunwind-devel] Another optimisation for x86-64 fast trace,
Lassi Tuura <=