[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Libunwind-devel] Updated fast trace patch with initial performance
From: |
Paul Pluzhnikov |
Subject: |
Re: [Libunwind-devel] Updated fast trace patch with initial performance results |
Date: |
Tue, 5 Apr 2011 16:20:38 -0700 |
On Tue, Apr 5, 2011 at 2:08 PM, Paul Pluzhnikov <address@hidden> wrote:
> Now all we have to do is figure out how to fix it ;-)
I see a couple of possible solutions:
1. Document the problem; ask users to call backtrace() early (before
calling pthread_key_create too many times), so it gets one of the
"pre-allocated" descriptors.
2. Arrange for libunwind __attribute__((constructor)) function to do the
same, and hope that it fires early enough.
3. Switch to using __thread, figure out some (likely extremely non-portable)
way to perform cleanup on thread termination.
All of 1, 2 and 3 are non-portable -- there is no guarantee that
pthread_key_create will not *alloc every time it is invoked, nor is pthread*
async-signal-safe.
On Tue, Apr 5, 2011 at 3:13 PM, Lassi Tuura <address@hidden> wrote:
> How far do we want to go in attempting to avoid the one calloc()? :-)
> Choices seem to be:
> a. Use __thread, require per-thread wrapper callbacks from app
In the context of e.g. malloc stack recorder, application callback is
generally not sufficient.
Consider: application is about to call pthread_exit, so calls libunwind
callback, which frees per-thread cache for current thread. The app then
calls pthread_exit.
Now the fun begins: pthread_exit calls __libc_thread_freeres, which calls
free(), which calls unwinder, which reallocates per-thread trace cache,
which is then leaked.
I think the best you can do is mark per-thread cache that it will likely
become cold soon, and deallocate it some time later (effectively turning
this into B).
> b. Use lock-free global cache stack, must still free 'unused' caches.
> c. Use pthread_getspecific, deal with calloc from pthread_key_create,
> maybe require app to call some init function once at 'safe' time if
> it uses unw_backtrace?
In general, C has the same problem for a malloc stack recorder: the very
first call to backtrace() may well come from within libc-internal call
to calloc(), and attempt to call pthread_setspecific at that point may
be unsafe, and the app has not even gained execution control yet!
OTOH, for glibc this wouldn't be a problem, as pthread_setspecific will
not call calloc() before 32 TSD keys have been created.
> I guess I'd go with c, b, then a. We can call once to get the key created
> at a safe time (= initialisation for our profiler), then never need to
> worry about destructor calls and don't need per-thread callbacks. Failing
> that I think I'd prefer b over a.
I think the only completely automatic and reasonably portable solution is B,
though it *is* going to a lot of trouble for a problem we don't really have ;-(
How about a variation of C:
4. Require the app to call e.g. libunwind_per_thread_init() from a safe
context for each thread in which it desires fast backtrace().
This call will allocate trace cache and do pthread_setspecific.
In tdep_trace(), if pthread_getspecific() returns NULL, then fall back
to the slow unwind.
Thanks,
--
Paul Pluzhnikov
- Re: [Libunwind-devel] Updated fast trace patch with initial performance results, Paul Pluzhnikov, 2011/04/04
- Re: [Libunwind-devel] Updated fast trace patch with initial performance results, Lassi Tuura, 2011/04/05
- Re: [Libunwind-devel] Updated fast trace patch with initial performance results, Lassi Tuura, 2011/04/05
- Re: [Libunwind-devel] Updated fast trace patch with initial performance results, Arun Sharma, 2011/04/05
- Re: [Libunwind-devel] Updated fast trace patch with initial performance results, Lassi Tuura, 2011/04/05
- Re: [Libunwind-devel] Updated fast trace patch with initial performance results, Arun Sharma, 2011/04/05
- Re: [Libunwind-devel] Updated fast trace patch with initial performance results, Lassi Tuura, 2011/04/05
- Re: [Libunwind-devel] Updated fast trace patch with initial performance results, Arun Sharma, 2011/04/05
- Re: [Libunwind-devel] Updated fast trace patch with initial performance results, Paul Pluzhnikov, 2011/04/05