libunwind-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] [RFC] _ULx86_64_tdep_trace returns off-by-one addr


From: Lassi Tuura
Subject: Re: [Libunwind-devel] [RFC] _ULx86_64_tdep_trace returns off-by-one addresses
Date: Tue, 6 Dec 2011 11:36:04 +0100

Hi Arun,

(I added libunwind list in copy.)

On Dec 6, 2011, at 7:28 , Arun Sharma wrote:

> On Sun, Dec 4, 2011 at 10:44 PM, Lassi Tuura <address@hidden> wrote:
> 
>> I suppose in fast trace under "case UNW_X86_64_FRAME_SIGRETURN" we could
>> maybe fix things up with something like:
>> 
>> /* In x86_64 the signal frame is usually __restore_rt, and we didn't
>>    get there by a call, it's just set up to look like that on stack.
>>    Because of d->use_prev_instr we ended up reporting an address one
>>    byte before the function at previous frame. Fix that up now. */
>> if (depth)
>>   buffer[depth-1] = (uint64_t) buffer[depth-1]+1;
>> 
> 
> Looks good. Please send me a patch.

Yeah will do once I've tested it. It's at least missing (void *) cast.

> Re: the differences between the fast/slow trace paths
> 
> There are two primary use cases (there may be more I'm not aware of):
> 
> * Profilers
> 
> People generally care more about the sample being attributed to the
> right function vs the off-by-1.
> 
> * Debuggers, core-dumper type
> 
> I think it'd be good to be consistent with existing tools (eg: gdb) here.

Yes, agreed. Ours is mainly the profiling kind. While we track actual addresses
in the profile, as you say users will almost never see the detail, only symbols.

That said the call tree correctness is pretty important for us. Our target
community is less familiar with all the things compilers do to their code, and
our main codes have thousands of functions with smallish contributions, 0.1%
to 3% level each, from a sea of multi-million line code base. Optimisation
efforts tend to look at lots of functions with seemingly small contributions,
and it's good if any unexpected call tree structure, even at <0.1% level, is
always a real platform feature, not inaccuracies introduced by the tools.

>> IIRC in my tests the decrementing is definitely needed, or you'll get some
>> percentage of irrational results where function A calling B gets reported
>> as unrelated function C calling B, because C happens to follow the last
>> instruction of A, which happens to get profiled (or at other FDE boundary,
>> and you care for some reason, which could happen with hot/cold splits).
>> 
> 
> Do you have a test case or a conceptual example where things are still
> broken? Eg: a profiler using libunwind attributes samples to the wrong
> func? I can think of funcs that never return. Anything more common?

We use libunwind fast trace via unw_backtrace(), and use the addresses as-is.
With that I am not aware of anything other than the issue found by Paul, apart
from the standard issues with inaccurate unwind info (compiler / system issues),
some special cases with missing unwind info (_init, __do_global_ctors_aux, needs
compiler crt*.o to be built with unwind info) and sporadic hangs inside dynamic
linker - all of which are not problems in libunwind itself.

For us the single largest source of bad call trees is still inaccurate unwind
info. Fortunately majority of those are obviously wrong - a hex address not in
any function - and on the decline with o/s and compiler updates.

Yes, compilers put calls to no-return functions at end. In C++ it includes the
exception handling, __cxa_throw to throw exceptions and say _Unwind_Resume for
exception continuation, which do get profile hits. Very longjump-y C code could
see similar issues. The __assert_fail family obviously rarely get profile hits.
I assume debuggers and crash reporters care about getting that part right.

Without use_prev_instr correction, mainly using unw_step() + unw_get_reg() and
not correcting the IP value, would have accuracy problems. If you use IP as-is,
the above issues, plus a debugger type app looking up info by address would be
suboptimal. If you always use IP-1, you miss what use_prev_instr fixed.

There's some special cases to mind: 1) functions optimised to a single jump
instruction, 2) functions made of a single call, possibly no-return target, and
3) functions with multiple entry points where the first one continues into the
next one. The latter are rare, except there are several foo, foo_nocancel pairs
in GLIBC. There's no call in 1, and I've not seen 3 with a call in it. I have
seen 2 generated by optimiser splitting out cold part of an inlined function.

BTW, it's common at least in our code to have a jump as the last instruction in
a function, say for a loop or a sibling call optimised to a jump. I spent quite
a while verifying those came out with IP in the right function. Off the top of
my head I don't recall what triggered that particular debugging episode, but we
did have some of those reported in the wrong function at one point. AFAIK, it's
all fine for unw_backtrace() these days.

Regards,
Lassi


reply via email to

[Prev in Thread] Current Thread [Next in Thread]