qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 3/4] qmp: Move dispatcher to a coroutine


From: Markus Armbruster
Subject: Re: [PATCH v4 3/4] qmp: Move dispatcher to a coroutine
Date: Wed, 19 Feb 2020 15:21:50 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Kevin Wolf <address@hidden> writes:

> Am 19.02.2020 um 10:03 hat Markus Armbruster geschrieben:
>> >> >>   @@ -246,8 +287,15 @@ void monitor_qmp_bh_dispatcher(void *data)
>> >> >>            }
>> >> >>            qmp_request_free(req_obj);
>> >> >> 
>> >> >>   -    /* Reschedule instead of looping so the main loop stays 
>> >> >> responsive */
>> >> >>   -    qemu_bh_schedule(qmp_dispatcher_bh);
>> >> >>   +        /*
>> >> >>   +         * Yield and reschedule so the main loop stays responsive.
>> >> >>   +         *
>> >> >>   +         * Move back to iohandler_ctx so that nested event loops for
>> >> >>   +         * qemu_aio_context don't start new monitor commands.
>> >> >> 
>> >> >> Can you explain this sentence for dummies?
>> >> >
>> >> > Nested event loops (i.e. AIO_WAIT_WHILE) poll qemu_aio_context, so if we
>> >> > are scheduled there, the next iteration of the monitor dispatcher loop
>> >> > could start from a nested event loop. If we are scheduled in
>> >> > iohandler_ctx, then only the actual main loop will reenter the coroutine
>> >> > and nested event loops ignore it.
>> >> >
>> >> > I'm not sure if or why this is actually important, but this matches
>> >> > scheduling the dispatcher BH in iohandler_ctx in the code before this
>> >> > patch.
>> >> >
>> >> > If we didn't do this, we could end up starting monitor requests in more
>> >> > places than before, and who knows what that would mean.
>> >> 
>> >> Let me say it in my own words, to make sure I got it.  I'm going to
>> >> ignore special cases like "not using I/O thread" and exec-oob.
>> >> 
>> >> QMP monitor I/O happens in mon_iothread, in iohandler_ctx (I think).
>> >> This pushes requests onto the monitor's qmp_requests queue.
>> >
>> > mon_iothread (like all iothreads) has a separate AioContext, which
>> > doesn't have a name, but can be accessed with
>> > iothread_get_aio_context(mon_iothread).
>> 
>> Got it.
>> 
>> @iohandler_ctx and @qemu_aio_context both belong to the main loop.
>> 
>> @qemu_aio_context is for general "run in the main loop" use.  It is
>> polled both in the actual main loop and event loops nested in it.  I
>> figure "both" to keep things responsive.
>> 
>> @iohandler_ctx is for "I/O handlers" (whatever those are).  It is polled
>> only in the actual main loop.  I figure this means "I/O handlers" may
>> get delayed while a nested event loop runs.  But better starve a bit
>> than run in unexpected places.
>> 
>> >> Before this patch, the dispatcher runs in a bottom half in the main
>> >> thread, in qemu_aio_context.
>> >
>> > The dispatcher BH is what is scheduled in iohandler_ctx. This means that
>> > the BH is run from the main loop, but not from nested event loops.
>> 
>> Got it.
>> 
>> >> The patch moves it to a coroutine running in the main thread.  It runs
>> >> in iohandler_ctx, but switches to qemu_aio_context for executing command
>> >> handlers.
>> >> 
>> >> We want to keep command handlers running in qemu_aio_context, as before
>> >> this patch.
>> >
>> > "Running in AioContext X" is kind of a sloppy term, and I'm afraid it
>> > might be more confusing than helpful here. What this means is that the
>> > code is run while holding the lock of the AioContext, and it registers
>> > its BHs, callbacks etc. in that AioContext so it will be called from the
>> > event loop of the respective thread.
>> >
>> > Before this patch, command handlers can't really use callbacks without a
>> > nested event loop because they are synchronous.
>> 
>> I figure an example for such a nested event loop is BDRV_POLL_WHILE() in
>> bdrv_truncate(), which runs in the command handler for block_resize.
>> True?
>
> Yes. I think most nested event loops are in the block layer, and they
> use the BDRV_POLL_WHILE() or AIO_WAIT_WHILE() macros today.
>
>> >                                                 The BQL is held, which
>> > is equivalent to holding the qemu_aio_context lock.
>> 
>> Why is it equivalent?  Are they always taken together?
>
> I guess ideally they would be. In practice, we neglect the
> qemu_aio_context lock and rely on the BQL for everything in the main
> thread. So maybe I should say the BQL replaces the AioContext lock for
> the main context rather than being equivalent.

Sounds a bit sloppy.  It works...

>> >                                                     But other than that,
>> > all of the definition of "running in an AioContext" doesn't really apply.
>> >
>> > Now with coroutines, you assign them an AioContext, which is the context
>> > in which BHs reentering the coroutine will be scheduled, e.g. from
>> > explicit aio_co_schedule(), after qemu_co_sleep_ns() or after waiting
>> > for a lock like a CoMutex.
>> >
>> > qemu_aio_context is what most (if not all) of the existing QMP handlers
>> > use when they run a nested event loop,
>> 
>> bdrv_truncate() appears to use bdrv_get_aio_context(), which is the
>> BlockDriverState's AioContext if set, else @qemu_aio_context.
>
> Right, the BDRV_POLL_WHILE() macro allows waiting for something in a
> different AioContext from the main context. This works because of the
> aio_wait_kick() in bdrv_truncate_co_entry(), which schedules a BH in the
> main context, and this is what BDRV_POLL_WHILE() is waiting for.

Hairy...

> So the command handler runs in the main context (and thread), whereas
> the bdrv_co_truncate() runs in the AioContext (and thread) of the BDS.
>
>> >                                        so running the coroutine in this
>> > context while calling the handlers will make the transition the easiest.
>> 
>> In the case of block_resize: running it in a coroutine makes
>> bdrv_truncate() use that coroutine instead of creating one together with
>> a nested event loop.  "That coroutine" uses @qemu_aio_context.  So does
>> the nested event loop, *except* when the BlockDriverState has an
>> AioContext.
>> 
>> I have no idea when a BlockDriverState has an AioContext.  Can you help?
>
> When it's used with a dataplane device, i.e. is tied a separate
> iothread.

I see.

> The fact that the coroutine fastpath (not only in bdrv_truncate(), but
> also everywhere else) assumes that we're already in the right AioContext
> might be a bug.
>
> If this is the only problem in v5, please consider applying only patches
> 1-3 (with the infrastructure) so that Marc-André's screendump patch gets
> its dependency resolved, and I'll send a separate series for converting
> block_resize that fixes this problem (probably by explicitly
> rescheduling the coroutine into the BDS context and back in the
> coroutine path).

Can do.

>> >> We want to run the rest in iohandler_ctx to ensure dispatching happens
>> >> only in the main loop, as before this patch.
>> >> 
>> >> Correct?
>> >
>> > This part is correct.
>> 
>> Alright, let me try again, so you can point out the next level of my
>> ignorance :)
>> 
>> QMP monitor I/O happens in I/O thread @mon_iothread, in the I/O thread's
>> AioContext.  The I/O thread's event loop polls monitor I/O.  The read
>> handler pushes requests onto the monitor's qmp_requests queue.
>> 
>> The dispatcher pops requests off these queues, and runs command
>> handlers for them.
>> 
>> Before this patch, the dispatcher runs in a bottom half in the main
>> thread with the BQL[*] held, in @iohandler_ctx context.  The latter
>> ensures dispatch happens only from the main loop, not nested event
>> loops.
>> 
>> The command handlers don't care for the AioContext; they're synchronous.
>> If they need AIO internally, they have to set up a nested event loop.
>> When they do, they pick the AioContext explicitly, so they still don't
>> rely on the current context.
>
> They do rely on being run in the main thread, but the exact context
> doesn't matter. (In theory, running them in the thread of the block
> node's AioContext would be fine, too, but the monitor can't know what
> that is.)

Yes.  Command handlers have always run in the main thread, and handler
code may rely on that.

> See the AIO_WAIT_WHILE() documentation for details.
>
>> The patch moves the dispatcher to a coroutine running in the main
>> thread.  It runs in iohandler_ctx, but switches to qemu_aio_context for
>> executing command handlers.
>> 
>> The dispatcher keeps running in @iohandler_ctx.  Good.
>> 
>> The command handlers now run in @qemu_aio_context.  Doesn't matter
>> because "command handlers don't care for the AioContext".  Except when a
>> command handler has a fast path like bdrv_truncate() has.  For those,
>> @iohandler_ctx is generally not what we want.  @qemu_aio_context is.
>> "Generally", since every one needs to be inspected for AioContext
>> shenanigans.
>
> Except for the bug that the specific code in bdrv_trunate() wants
> to run the coroutine in the BDS AioContext, which may be different from
> qemu_aio_context...
>
> But I think the theory is right, for those commands that don't
> potentially schedule a coroutine in a different AioContext.

Let's review how switching to @qemu_aio_context for executing command
handlers affects command handlers:

* Handler is completely synchronous: the switch has no effect.

* Handler has a nested event loop

  - with a fast path for "running in coroutine context", and

    + the fast path tacitly assumes @qemu_aio_context: the switch makes
      it work without change

    + the fast path tacitly assumes anything else: the switch is
      useless, the fast path doesn't work without further changes

  - without such a fast path (not sure this case exists): the switch is
    useless, the nested event loop works anyway

* Handler has only the fast path, i.e. it assumes it runs coroutine
  context (this case cannot exist now, but this series enables it), and

  - the handler wants @qemu_aio_context: the switch saves it the trouble
    to switch itself

  - the handler wants anything else: the switch is useless

Whether the switch is actually useful often enough to be worthwhile I
can't say.  But because @qemu_aio_context is the "normal" context for
"normal" code running in the main loop, switching feels kind of right to
me.

I think we need to talk about AioContext in qapi-code-gen.txt.  PATCH 1
now adds

  Member 'coroutine' tells the QMP dispatcher whether the command handler
  is safe to be run in a coroutine.  It defaults to false.  If it is true,
  the command handler is called from coroutine context and may yield while
  waiting for an external event (such as I/O completion) in order to avoid
  blocking the guest and other background operations.
  
What about "is called from a coroutine running in the main loop with
@qemu_aio_context, and may yield"?

Likewise for the commit message of PATCH 3:

  This moves the QMP dispatcher to a coroutine and runs all QMP command
  handlers that declare 'coroutine': true in coroutine context so they
  can avoid blocking the main loop while doing I/O or waiting for other
  events.

"calls all ... in a coroutine running in the main loop with
@qemu_aio_context, so they can".

Speaking of PATCH 1:

  It is an error to specify both 'coroutine': true and 'allow-oob': true
  for a command.  (We don't currently have a use case for both together and
  without a use case, it's not entirely clear what the semantics should be.)

Let's lose the parenthesis around the last sentence.

If you agree with my proposed tweaks, and nothing else comes up, I can
try to do them in my tree.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]