[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 2/2] vhost-user: add a request-reply lock
From: |
Michael S. Tsirkin |
Subject: |
Re: [PATCH v2 2/2] vhost-user: add a request-reply lock |
Date: |
Thu, 29 Aug 2024 11:05:15 -0400 |
On Thu, Aug 29, 2024 at 10:29:24AM -0400, Peter Xu wrote:
> On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote:
> > Hello Michael,
> >
> > On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > Weird. Seems to indicate some kind of deadlock?
> >
> > * Such a deadlock should occur across all environments I guess, not
> > sure why it happens selectively. It is strange.
> >
> > > So maybe vhost_user_postcopy_end should take the BQL?
> > ===
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index e7c1215671..31acda3818 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
> > */
> > qemu_event_wait(&mis->main_thread_load_event);
> > }
> > + bql_lock();
> > postcopy_ram_incoming_cleanup(mis);
> > + bql_unlock();
> >
> > if (load_res < 0) {
> > /*
> > ===
> >
> > * Actually a BQL patch above was tested and it worked fine. But not
> > sure if it is an acceptable solution. Another contention was taking
> > BQL could make things more complicated, so a local vhost-user specific
> > lock should be better.
> >
> > ...wdyt?
>
> I think Michael was suggesting taking bql in vhost_user_postcopy_end(), not
> in postcopy code directly.
maybe that's better, ok.
> I'm recently looking at how to make precopy
> load even take less bql and even make it a separate thread. Above is
> definitely going backwards, per we discussed already internally.
At the same time a small bugfix is better, can be backported.
> I cherish postcopy doesn't need to take bql on its own in most paths, and
> we shouldn't add unnecessary bql requirement even if vhost-user isn't used.
>
> Personally I still prefer we look into why a separate mutex won't work and
> why that timed out; that could be part of whoever is going to investigate
> the whole issue (including the hang later on). Otherwise I'm ok from
> migration pov that we take bql in the vhost-user hook, but not in savevm.c.
>
> Thanks,
ok
> --
> Peter Xu