qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] AIO: Reduce number of threads for 32bit hosts


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH] AIO: Reduce number of threads for 32bit hosts
Date: Thu, 15 Jan 2015 11:00:12 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 14.01.2015 um 15:09 hat Alexander Graf geschrieben:
> On 01/14/15 15:07, Kevin Wolf wrote:
> >Am 14.01.2015 um 14:49 hat Paolo Bonzini geschrieben:
> >>
> >>On 14/01/2015 14:38, Kevin Wolf wrote:
> >>>Well, what do you want to use it for? I thought it would only be for a
> >>>one-time check where we usually end up rather than something that would
> >>>be enabled in production, but maybe I misunderstood.
> >>No, you didn't.  Though I guess we could limit the checks to the yield
> >>points.  If we have BDS recursion, as in the backing file case, yield
> >>points should not be far from the deepest part of the stack.
> >>
> >>Another possibility (which cannot be enabled in production) is to fill
> >>the stack with a known 64-bit value, and do a binary search when the
> >>coroutine is destroyed.
> >Maybe that's the easiest one, yes.
> >
> >>>>I tried gathering warning from GCC's -Wstack-usage=1023 option and the
> >>>>block layer does not seem to have functions with huge stacks in the I/O
> >>>>path.
> >>>>
> >>>>So, assuming a maximum stack depth of 50 (already pretty generous since
> >>>>there shouldn't be any recursive calls) a 100K stack should be pretty
> >>>>much okay for coroutines and thread-pool threads.
> >>>The potential problem in the block layer is long backing file chains.
> >>>Perhaps we need to do something to solve that iteratively instead of
> >>>recursively.
> >>Basically first read stuff from the current BDS, and then "fill in the
> >>blanks" with a tail call on bs->backing_file?  That would be quite a
> >>change, and we'd need a stopgap measure like Alex's patch in the meanwhile.
> >Basically block.c would do something like get_block_status() first and
> >only then call the read/write functions of the individual drivers. But
> >yes, that's more a theoretical consideration at this point.
> >
> >I think with the 50 recursions that you calculated we should be fine in
> >practice for now. I would however strongly recommend finally implementing
> >a guard page for coroutine stacks before we make that change.
> 
> We could just write mprotect an excessively mapped page as guard page, no?

Not just write protect, but PROT_NONE, but otherwise yes, I think that's
how it usually done (or directly with a mmap instead of modifying it
after the fact).

> >Anyway, the thread pool workers aren't affected by any of this, so they
> >would be the obvious first step.
> 
> Yes, ideally we would have the maximum number of threads be runtime
> configurable too. That way you can adjust them to your workload.

Should be easy enough to add an option to raw-{posix,win32} for that.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]