qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy


From: Dor Laor
Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
Date: Tue, 01 Mar 2011 10:59:02 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Thunderbird/3.1.7 ThunderBrowse/3.3.4

On 02/28/2011 08:12 PM, Anthony Liguori wrote:

On Feb 28, 2011 11:47 AM, "Avi Kivity" <address@hidden
<mailto:address@hidden>> wrote:
 >
 > On 02/28/2011 07:33 PM, Anthony Liguori wrote:
 >>
 >>
 >> >
 >> > You're just ignoring what I've written.
 >>
 >> No, you're just impervious to my subtle attempt to refocus the
discussion on solving a practical problem.
 >>
 >> There's a lot of good, reasonably straight forward changes we can
make that have a high return on investment.
 >>
 >
 > Is making qemu the authoritative source of configuration information
a straightforward change?  Is the return on it high?  Is the investment low?

I think this is where we fundamentally disagree.  My position is that
QEMU is already the authoritative source.  Having a state file doesn't
change anything.

Do a hot unplug of a network device with upstream libvirt with acpiphp
unloaded, consult libvirt and then consult the monitor to see who has
the right view of the guests config.

To me, that's the definition of authoritative.

 > "No" to all three (ignoring for the moment whether it is good or not,
which we were debating).
 >
 >
 >> The only suggestion I'm making beyond Marcelo's original patch is
that we use a structured format and that we make it possible to use the
same file to solve this problem in multiple places.
 >>
 >
 > No, you're suggesting a lot more than that.

That's exactly what I'm suggesting from a technical perspective.

 >> I don't think this creates a fundamental break in how management
tools interact with QEMU.  I don't think introducing RAID support in the
block layer is a reasonable alternative.
 >>
 >>
 >
 > Why not?

Because its a lot of complexity and code that can go wrong while only
solving the race for one specific case.  Not to mention that we double
the iop rate.

 > Something that avoids the whole state thing altogether:
 >
 > - instead of atomically switching when live copy is done, keep on
issuing writes to both the origin and the live copy
 > - issue a notification to management
 > - management receives the notification, and issues an atomic blockdev
switch command

 > this is really the RAID-1 solution but without the state file (credit
Dor).  An advantage is that there is no additional latency when trying
to catch up to the dirty bitmap.

It still suffers from the two generals problem.  You cannot solve this
without making one node reliable and that takes us back to it being
either QEMU (posted event and state file) or the management tool (sync
event).

It is safe w/o a state file by changing the basic live copy algorithm:

1. Live copy in progress stage
   Once live copy command is issued, a dirty bit map is created for
   tracking. There is a single pass over the entire image where we copy
   blocks from the src to the dst.

   Write commands for blocks that were already copied will be done
   twice for the src and dst.

   Once the full copy single pass ends, we trigger a QMP event that
   this stage can end.

   The live copy stage keeps running till the management issue a switch
   command. When it will happen, the switch is immediate and no need to
   copy additional blocks (but flush pending IOs).

2. Management sends a switch command.
   Qemu stops the doubling the IO and switches to the destination.
   End.

Now let's see the error case:
- qemu failure over stage #1
  No matter what happens, the management will start qemu with the
  source image. The destination will be erased, no matter how much we
  copied.
- management failure over stage #1
  The new mgmt daemon needs to query qemu's status.
  Management can continue as before.
- qemu+mgmt failure at stage #1
  The management should just run qemu with the source image.
- mgmt failure post sending stage #2 command.
  The mgmt DB states that we switched, just need to connect to qemu.
- qemu failure before/after getting the stage #2 event.
  Management will need just to execute new qemu with the dst image
- Failure of both qemu & mgmt in stage #2
  The same as above.

Pros:
 - Fast switch over time, minimal latency
 - No external storage/config needed
 - No need to wait for mgmt

Thanks,
Dor


Regards,

Anthony Liguori

 >
 > --
 > error compiling committee.c: too many arguments to function
 >





reply via email to

[Prev in Thread] Current Thread [Next in Thread]