[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 3/4] migration: zero-copy flush only at the end of bitmap
From: |
Leonardo Brás |
Subject: |
Re: [PATCH v4 3/4] migration: zero-copy flush only at the end of bitmap scanning |
Date: |
Tue, 21 Jun 2022 00:35:54 -0300 |
User-agent: |
Evolution 3.44.2 |
On Mon, 2022-06-20 at 11:44 -0400, Peter Xu wrote:
> On Mon, Jun 20, 2022 at 11:23:53AM +0200, Juan Quintela wrote:
> > Once discussed this, what I asked in the past is that you are having too
> > much dirty memory on zero_copy. When you have a Multiterabyte guest, in
> > a single round you have a "potentially" dirty memory on each channel of:
> >
> > total_amount_memory / number of channels.
> >
> > In a Multiterabyte guest, this is going to be more that probably in the
> > dozens of gigabytes. As far as I know there is no card/driver that will
> > benefit for so many pages in zero_copy, and kernel will move to
> > synchronous copy at some point. (In older threads, daniel showed how to
> > test for this case).
>
> I was wondering whether the kernel needs to cache a lot of messages for
> zero copy if we don't flush it for a long time, as recvmsg(MSG_ERRQUEUE)
> seems to be fetching one message from the kernel one at a time. And,
> whether that queue has a limit in length or something.
IIRC, if all messages look the same, it 'merges' them in a single message, like,
'this range has these flags and output'.
So, if no issue happens, we should have a single message with the confirmation
of all sent buffers, meaning just a little memory is used for that.
>
> Does it mean that when the kernel could have cached enough of these
> messages then it'll fallback to the no-zero-copy mode? And probably that's
> the way how kernel protects itself from using too much buffer for the error
> msgs?
Since it merges the messages, I don't think it uses a lot of space for that.
IIRC, the kernel will fall back to copying only if the network adapter / driver
does not support MSG_ZEROCOPY, like when it does not support scatter-gather.
>
> This reminded me - Leo, have you considered adding the patch altogether to
> detect the "fallback to non-zero-copy" condition? Because when with it and
> when the fallback happens at some point (e.g. when the guest memory is
> larger than some value) we'll know.
I still did not consider that, but sure, how do you see that working?
We can't just disable zero-copy-send because the user actually opted in, so we
could instead add a one time error message for when it falls back to copying, as
it should happen in the first try of zero-copy send.
Or we could fail the migration, stating the interface does not support
MSG_ZEROCOPY, since it should happen in the first sendmsg().
I would personally opt for the last option.
What do you think?
>
> Thanks,
>
Thanks Peter!
Best regards,
Leo
[PATCH v4 1/4] QIOChannelSocket: Introduce assert and reduce ifdefs to improve readability, Leonardo Bras, 2022/06/20
[PATCH v4 2/4] QIOChannelSocket: Fix zero-copy send so socket flush works, Leonardo Bras, 2022/06/20