[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 0/3] Allow to enable multifd and postcopy migration togeth
From: |
Peter Xu |
Subject: |
Re: [PATCH v2 0/3] Allow to enable multifd and postcopy migration together |
Date: |
Thu, 5 Dec 2024 17:41:18 -0500 |
On Mon, Dec 02, 2024 at 12:37:19PM +0530, Prasad Pandit wrote:
> Hello Peter,
>
> On Fri, 29 Nov 2024 at 22:16, Peter Xu <peterx@redhat.com> wrote:
> > I saw that there's still discussion in the previous version, while this
> > cover letter doesn't mention why it was ignored. Especially, at least to
> > me, what Fabiano commented on simplifying the flush condition check makes
> > senes to me to be addressed.
>
> * It is not ignored. Simplifying flush conditionals makes sense to me
> too, that is why in the 'v0' version of this series I had added the
> !migration_in_postcopy() check to the migrate_multifd() function,
> right?
As explained, that addition was wrong, because migrate_multifd() should
always return the user option only. Again, you can add another helper.
> I tried to discuss in the 'v1' thread if there's another way to
> simplify conditionals. Not sure if you've followed all comments in the
> thread.
I'll post a version to clean it up, either we go with Fabiano's, or mine,
or a 3rd option. We shouldn't pile up more condition check there. It's
growing into something not maintainable.
>
> * Secondly, as you mention above, I also thought Fabiano is pointing
> at the complexity of the 'if' conditionals and thus I replied that his
> proposed patch does not seem to solve for that complexity. But in his
> subsequent reply Fabiano mentions that it is not just about
> conditionals, but larger complexity of how and when multifd threads do
> flush and sync amongst them.
Yes they're relevant, but I think we can cleanup the whole thing and it's
not that complicated, IMHO. We'll see.
>
> * IMHO, simplifying that larger complexity of how multifd threads do
> flush and sync can be done independently, outside the scope of this
> patch series, which is about enabling multifd and postcopy together.
I assume you're working on the test cases, I hope this won't block you from
continuing your work on this series.
As mentioned above, I think we need to clean this up before moving on,
unfortunately. And I hope things settle already before you have the test
cases ready. I appreciate you add the test cases for multifd+postcopy.
That's very important. Before that you can keep your patch as-is, and
leave that part for us to figure out. Feel free to chime in anytime as
well.
>
> > Meanwhile, before I read into any details, I found that all the tests I
> > requested are still missing. Would you please consider adding them?
> >
> > My previous comment regarding to test is here:
> > https://lore.kernel.org/qemu-devel/ZykJBq7ME5jgSzCA@x1n/
> >
> > I listed exactly the minimum set of tests that I think we should have.
> ...
> > In general, any migration new feature must have both doc updates and test
> > coverage as long as applicable.
> >
> > Multifd still has its doc missing, which is unfortunate. We could have one
> > doc prior to this work, but it can also be done later.
> >
> > OTOH on testing: this is not a new feature alone, but it's more dangerous
> > than a new feature due to what I mentioned before, that postcopy needs
> > atomicity on page movements, and multifd is completely against that from
> > design POV due to NIC drivers being able to modify guest pages directly.
> >
> > It means multifd+postcopy bugs will be extremely hard to debug if we don't
> > put it right first. So please be serious on the test coverage on this
> > work.
>
> * I'm yet to get to the test cases. The revised series(v1 and v2) are
> posted to share patch updates which were suggested in the previous
> reviews. Test cases are a separate/different effort from source
> patches. If we want to hold on this patch series till we get the test
> cases and documentation in place, that is okay, I'll work on that
> next.
So we talked about this in our meeting, but still just to keep it a record
so whoever work on migration can reference: we do require test cases and
it's not separate effort. We request both test cases and docs to present
before mering a feature, unless there's good reason to not to.
E.g. multifd doesn't yet have doc, so doc is not required for this work
yet, however test cases are.
Another outlier is VFIO+multifd cannot easily add test case because CI
normally doesn't have available hardware environment. However does should
apply there to be required at least from migration POV.
Thanks,
--
Peter Xu