|
From: | Lassi Tuura |
Subject: | Re: [Libunwind-devel] libunwind x86-64 optimisations? |
Date: | Mon, 6 Jul 2009 16:06:55 +0200 |
Hi,
It's a shame that GLIBC calls its internal _dl_debug_state directly, and not through _r_debug.r_brk; if it did, you could intercept library events and maintain a private copy of the list. You can still read the list on your own via _r_debug.r_map, although it may be in an inconsistent state. All that gets you, though, is the public fields of the link map. I don't remember for sure, but I believe libunwind does need the program headers - it would have to cache a copy, and validate them using l_addr/l_name/l_ld.
Ah thanks, yes. As our profiler rewrites (some) machine code on the fly, it looks like I could vector ourselves into _dl_debug_state() dynamically at run time, capture a copy of the information and find a way to teach libunwind to use the private data instead.
2) Try to determine which frames are "varying", i.e. uses VLAs or alloca(). [...]This information is not available, so you need a different approach. Since in practice most x86-64 frames use either rsp or rbp for their CFA, you could cache which it was and where, if anywhere, rbp was saved. That should be pretty quick. If there are any data-dependent instructions in the CFI to be concerned about you could also maintain a "safe" flag in the unwinder, but I think the only one is DW_CFA_val_expression (maybe DW_CFA_GNU_window_save but you won't see that on x86-64).
I did run into functions that had non trivial CFA location; it wasn't a straight register value but some sort of an expression. I'll do more research to see what exactly happens in functions using alloca () / vla, and what the DWARF info looks like.
Thanks for the ideas! If anyone else has suggestions on further optimisation opportunities, would love to hear them :-)
(Pointers to alternative high-quality profiling tools are equally welcome. We're not married to ours, it's just we've never found anything else that a) requires no code instrumentation, b) runs at decent performance, c) sufficiently featureful, and d) can deal with our software.)
Regards, Lassi
[Prev in Thread] | Current Thread | [Next in Thread] |