qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] New Migration Protocol using Visitor Interface


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [RFC] New Migration Protocol using Visitor Interface
Date: Mon, 3 Oct 2011 17:45:54 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Oct 03, 2011 at 10:00:48AM -0500, Anthony Liguori wrote:
> On 10/03/2011 09:41 AM, Michael S. Tsirkin wrote:
> >On Mon, Oct 03, 2011 at 08:51:10AM -0500, Anthony Liguori wrote:
> >>On 10/03/2011 08:38 AM, Michael S. Tsirkin wrote:
> >>>On Mon, Oct 03, 2011 at 07:55:48AM -0500, Anthony Liguori wrote:
> >>>>On 10/02/2011 04:08 PM, Michael S. Tsirkin wrote:
> >>>>>On Sun, Oct 02, 2011 at 04:21:47PM -0400, Stefan Berger wrote:
> >>>>>>
> >>>>>>>4) Implement the BERVisitor and make this the default migration 
> >>>>>>>protocol.
> >>>>>>>
> >>>>>>>Most of the work will be in 1), though with the implementation in this 
> >>>>>>>series we should be able to do it incrementally. I'm not sure if the 
> >>>>>>>best approach is doing the mechanical phase 1 conversion, then doing 
> >>>>>>>phase 2 sometime after 4), doing phase 1 + 2 as part of 1), or just 
> >>>>>>>doing VMState conversions which gives basically the same capabilities 
> >>>>>>>as phase 1 + 2.
> >>>>>>>
> >>>>>>>Thoughts?
> >>>>>>Is anyone working on this? If not I may give it a shot (tomorrow++)
> >>>>>>for at least some of the primitives... for enabling vNVRAM metadata
> >>>>>>of course. Indefinite length encoding of constructed data types I
> >>>>>>suppose won't be used otherwise the visitor interface seems wrong
> >>>>>>for parsing and skipping of extra data towards the end of a
> >>>>>>structure if version n wrote the stream and appended some of its
> >>>>>>version n data and now version m<    n is trying to read the struct
> >>>>>>and needs to skip the version [m+1, n ] data fields ... in that case
> >>>>>>the de-serialization of the stream should probably be stream-driven
> >>>>>>rather than structure-driven.
> >>>>>>
> >>>>>>    Stefan
> >>>>>
> >>>>>Yes I've been struggling with that exactly.
> >>>>>Anthony, any thoughts?
> >>>>
> >>>>It just depends on how you write your visitor.  If you used
> >>>>sequences, you'd probably do something like this:
> >>>>
> >>>>start_struct ->
> >>>>   check for sequence tag, push starting offset and size onto stack
> >>>>   increment offset to next tag
> >>>>
> >>>>type_int (et al) ->
> >>>>   check for explicit type, parse data
> >>>>   increment offset to next tag
> >>>>
> >>>>end_struct ->
> >>>>   pop starting offset and size to temp variables
> >>>>   set offset to starting offset + size
> >>>>
> >>>>This is roughly how the QMP input marshaller works FWIW.
> >>>>
> >>>>Regards,
> >>>>
> >>>>Anthony Liguori
> >>>
> >>>One thing I worry about is enabling zero copy for
> >>>large string types (e.g. memory migration).
> >>
> >>Memory shouldn't be done through Visitors.  It should be handled as a 
> >>special case.
> >
> >OK, that's fine then.
> >
> >>>So we need to be able to see a tag for memory page + address,
> >>>read that from socket directly at the correct virtual address.
> >>>
> >>>Probably, we can avoid using visitors for memory, and hope
> >>>everything else can stand an extra copy since it's small.
> >>>
> >>>But then, why do we worry about the size of
> >>>encoded device state as Anthony seems to do?
> >>
> >>There's a significant difference between the cost of something on
> >>the wire and the cost of doing a memcpy.  The cost of the data on
> >>the wire is directly proportional to downtime.  So if we increase
> >>the size of the device state by a factor of 10, we increase the
> >>minimum downtime by a factor of 10.
> >>
> >>Of course, *if* the size of device state is already negligible with
> >>respect to the minimum downtime, then it doesn't matter.  This is
> >>easy to quantify though.  For a normal migration session today,
> >>what's the total size of the device state in relation to the
> >>calculated bandwidth of the minimum downtime?
> >>
> >>If it's very small, then we can add names and not worry about it.
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >
> >Yes, it's easy to quantify. I think the following gives us
> >the offset before and after, so the difference is the size
> >we seek, right?
> 
> Yeah, you'll also want:
> 
> diff --git a/arch_init.c b/arch_init.c
> index a6c69c7..0d64200 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -334,6 +334,10 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, 
> voi
> 
>      expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
> 
> +    if (stage == 2 && expected_time <= migrate_max_downtime()) {
> +        fprintf(stderr, "max bwidth: %lld\n", (long)(expected_time * 
> bwidth));
> +    }
> +
>      return (stage == 2) && (expected_time <= migrate_max_downtime());
>  }
> 
> You'll want to compare the size to max bwidth.

Well that depends on how guest behaves etc. I'm guessing
just full memory size is a sane thing to compare against.

I don't have a problem sticking this fprintf in as well.

> BTW, putting this info properly into migration stats would probably
> be pretty useful.
> 
> Regards,
> 
> Anthony Liguori

Problem is adding anything to monitor makes me worry
about future compatibility so much I usually just give up.
IMO we really need a namespace for in-development experimental
commands, like "unsupported-XXX", this would belong.

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]