[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v1 00/13] Multifd 🔀 device state transfer support with VFIO c
From: |
Fabiano Rosas |
Subject: |
Re: [PATCH v1 00/13] Multifd 🔀 device state transfer support with VFIO consumer |
Date: |
Wed, 17 Jul 2024 17:19:15 -0300 |
Peter Xu <peterx@redhat.com> writes:
> On Tue, Jul 16, 2024 at 10:10:12PM +0200, Maciej S. Szmigiero wrote:
>> On 27.06.2024 16:56, Peter Xu wrote:
>> > On Thu, Jun 27, 2024 at 11:14:28AM +0200, Maciej S. Szmigiero wrote:
>> > > On 26.06.2024 18:23, Peter Xu wrote:
>> > > > On Wed, Jun 26, 2024 at 05:47:34PM +0200, Maciej S. Szmigiero wrote:
>> > > > > On 26.06.2024 03:51, Peter Xu wrote:
>> > > > > > On Wed, Jun 26, 2024 at 12:44:29AM +0200, Maciej S. Szmigiero
>> > > > > > wrote:
>> > > > > > > On 25.06.2024 19:25, Peter Xu wrote:
>> > > > > > > > On Mon, Jun 24, 2024 at 09:51:18PM +0200, Maciej S. Szmigiero
>> > > > > > > > wrote:
>> > > > > > > > > Hi Peter,
>> > > > > > > >
>> > > > > > > > Hi, Maciej,
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On 23.06.2024 22:27, Peter Xu wrote:
>> > > > > > > > > > On Tue, Jun 18, 2024 at 06:12:18PM +0200, Maciej S.
>> > > > > > > > > > Szmigiero wrote:
>> > > > > > > > > > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>> > > > > > > > > > >
>> > > > > > > > > > > This is an updated v1 patch series of the RFC (v0)
>> > > > > > > > > > > series located here:
>> > > > > > > > > > > https://lore.kernel.org/qemu-devel/cover.1713269378.git.maciej.szmigiero@oracle.com/
>> > > > > > > > > >
>> > > > > > > > > > OK I took some hours thinking about this today, and here's
>> > > > > > > > > > some high level
>> > > > > > > > > > comments for this series. I'll start with which are more
>> > > > > > > > > > relevant to what
>> > > > > > > > > > Fabiano has already suggested in the other thread, then
>> > > > > > > > > > I'll add some more.
>> > > > > > > > > >
>> > > > > > > > > > https://lore.kernel.org/r/20240620212111.29319-1-farosas@suse.de
>> > > > > > > > >
>> > > > > > > > > That's a long list, thanks for these comments.
>> > > > > > > > >
>> > > > > > > > > I have responded to them inline below.
>> > > > > > > > > (..)
>> > > > > > >
>> > > > > > > 2) Submit this operation to the thread pool and wait for it to
>> > > > > > > complete,
>> > > > > >
>> > > > > > VFIO doesn't need to have its own code waiting. If this pool is
>> > > > > > for
>> > > > > > migration purpose in general, qemu migration framework will need
>> > > > > > to wait at
>> > > > > > some point for all jobs to finish before moving on. Perhaps it
>> > > > > > should be
>> > > > > > at the end of the non-iterative session.
>> > > > >
>> > > > > So essentially, instead of calling save_live_complete_precopy_end
>> > > > > handlers
>> > > > > from the migration code you would like to hard-code its current VFIO
>> > > > > implementation of calling
>> > > > > vfio_save_complete_precopy_async_thread_thread_terminate().
>> > > > >
>> > > > > Only it wouldn't be then called VFIO precopy async thread terminate
>> > > > > but some
>> > > > > generic device state async precopy thread terminate function.
>> > > >
>> > > > I don't understand what did you mean by "hard code".
>> > >
>> > > "Hard code" wasn't maybe the best expression here.
>> > >
>> > > I meant the move of the functionality that's provided by
>> > > vfio_save_complete_precopy_async_thread_thread_terminate() in this patch
>> > > set
>> > > to the common migration code.
>> >
>> > I see. That function only does a thread_join() so far.
>> >
>> > So can I understand it as below [1] should work for us, and it'll be clean
>> > too (with nothing to hard-code)?
>>
>> It will need some signal to the worker threads pool to terminate before
>> waiting for them to finish (as the code in [1] just waits).
>>
>> In the case of current vfio_save_complete_precopy_async_thread()
>> implementation,
>> this signal isn't necessary as this thread simply terminates when it has read
>> all the date it needs from the device.
>>
>> In a worker threads pool case there will be some threads waiting for
>> jobs to be queued to them and so they will need to be somehow signaled
>> to exit.
>
> Right. We may need something like multifd_send_should_exit() +
> MultiFDSendParams.sem. It'll be nicer if we can generalize that part so
> multifd threads can also rebase to that thread model, but maybe I'm asking
> too much.
>
>>
>> > The time to join() the worker threads can be even later, until
>> > migrate_fd_cleanup() on sender side. You may have a better idea on when
>> > would be the best place to do it when start working on it.
>> >
>> > >
>> > > > What I was saying is if we target the worker thread pool to be used for
>> > > > "concurrently dump vmstates", then it'll make sense to make sure all
>> > > > the
>> > > > jobs there were flushed after qemu dumps all non-iterables (because
>> > > > this
>> > > > should be the last step of the switchover).
>> > > >
>> > > > I expect it looks like this:
>> > > >
>> > > > while (pool->active_threads) {
>> > > > qemu_sem_wait(&pool->job_done);
>> > > > }
>> >
>> > [1]
>> >
>> (..)
>> > > I think that with this thread pool introduction we'll unfortunately
>> > > almost certainly
>> > > need to target this patch set at 9.2, since these overall changes (and
>> > > Fabiano
>> > > patches too) will need good testing, might uncover some performance
>> > > regressions
>> > > (for example related to the number of buffers limit or Fabiano multifd
>> > > changes),
>> > > bring some review comments from other people, etc.
>> > >
>> > > In addition to that, we are in the middle of holiday season and a lot of
>> > > people
>> > > aren't available - like Fabiano said he will be available only in a few
>> > > weeks.
>> >
>> > Right, that's unfortunate. Let's see, but still I really hope we can also
>> > get some feedback from Fabiano before it lands, even with that we have
>> > chance for 9.1 but it's just challenging, it's the same condition I
>> > mentioned since the 1st email. And before Fabiano's back (he's the active
>> > maintainer for this release), I'm personally happy if you can propose
>> > something that can land earlier in this release partly. E.g., if you want
>> > we can at least upstream Fabiano's idea first, or some more on top.
>> >
>> > For that, also feel to have a look at my comment today:
>> >
>> > https://lore.kernel.org/r/Zn15y693g0AkDbYD@x1n
>> >
>> > Feel free to comment there too. There's a tiny uncertainty there so far on
>> > specifying "max size for a device state" if do what I suggested, as multifd
>> > setup will need to allocate an enum buffer suitable for both ram + device.
>> > But I think that's not an issue and you'll tackle that properly when
>> > working on it. It's more about whether you agree on what I said as a
>> > general concept.
>> >
>>
>> Since it seems that the discussion on Fabiano's patch set has subsided I
>> think
>> I will start by basing my updated patch set on top of his RFC and then if
>> Fabiano wants to submit v1/v2 of his patch set then I will rebase mine on top
>> of it.
>>
>> Otherwise, you can wait until I have a v2 ready and then we can work with
>> that.
>
> Oh I thought you already started modifying his patchset.
>
> In this case, AFAIR Fabiano has plan to rework that RFC series, so maybe
> you want to double check with him, and can also wait for his new version if
> that's easier, because I do expect there'll be major changes.
>
> Fabiano?
Don't wait on me. I think I can make the changes Peter suggested without
affecting too much the interfaces used by this series. If it comes to
it, I can rebase this series "under" Maciej's.