Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistenc

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistenc

From:	Walid Nouri
Subject:	Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date:	Tue, 23 Sep 2014 18:36:42 +0200
User-agent:	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

Am 18.09.2014 15:56, schrieb Stefan Hajnoczi:

There is the issue of request ordering (using write cache flushes).  The
secondary probably needs to perform requests in the same order and
interleave cache flushes in the same way as the primary.  Otherwise a
power failure on the secondary could leave the disk in an invalid state
that is impossible on the primary.  So I'm just pointing out that cache
flush operations matter, not just read/write.

To be honest, my thought was that drive-mirror handles all block devicespecific problems especially the cache flush requests for writeordering. So my naive approach was to use an existing functionality as akind of black box transport mechanism and build on top of it. But thatseems to be not possible for the subtle tricky part of the game.

This means the "block filter" on the secondary must ensure the commitsemantics. But for doing that it must be able to interpret the writeordering semantic of a stream of write requests.


The second, and bigger, point is that if disk commit holds back
checkpoint commit it could be a significant performance problem due to
the slow nature of disks.

You are completely right. This would raise the latency for the primary.This can be done by changing the proposed protocol to write directly atthe primary and asynchronously applying updates to the secondary.

There are fancier solutions using either a journal or snapshots that
provide data integrity without posing a performance bottleneck during
the commit phase.

The trick is to apply write requests as they come off the wire on the
secondary but use a journal or snapshot mechanism to enforce commit
semantics.  That way the commit doesn't have to wait for writing out all
the data to disk.

Wouldn't that mean to send a kind of protocol information with themodified Blocks, a barrier or somthing like that?

Can you please explain a little more what you meant?

The details depend on the code and I don't remember everything well
enough.  Anyway, my mental model is:

1. The dirty bit is set *after* the primary has completed the write.
    See bdrv_aligned_pwritev().  Therefore you cannot use the dirty
    bitmap to query in-flight requests, instead you have to look at
    bs->tracked_requests.

2. The mirror block job periodically scans the dirty bitmap (when there
    is no rate-limit set it does this with no artifical delays) and
    writes the dirty blocks.

Given that cache flush requests probably need to be tracked too, maybe
you need MC-specific block driver on the primary to monitor and control
I/O requests.

But I haven't thought this through and it's non-trivial so we need to
break this down more.

As drive-mirror lacks this functionality a way (without changing thedrive-mirror code) might be a MC-specific mechanism on the primary. Thismechanism must respect write ordering requests (like forced cache flush,and Force Unit Access request) and send corresponding information for astream of blocks to the secondary.

From what I have learned i'm assuming most guest OS filesystem/blocklayer follows an ordering interface based on SCSI???? As those kind ofrequests must be flaged in an I/O request by the guest operating systemthis should be possible. Do we have the chance to access thoseinformation in a guest request?

If this is possible does this information survives the journey throughthe nbd-server or must there be another communication channel like theQEMUFile approach of “block-migration.c”?


Walid

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/10
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Michael R. Hines, 2014/09/10
  - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Hongyang Yang, 2014/09/11
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Paolo Bonzini, 2014/09/11
- Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Dr. David Alan Gilbert, 2014/09/11
  - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/11
  - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Hongyang Yang, 2014/09/11
  - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Stefan Hajnoczi, 2014/09/12
    - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/17
    - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Stefan Hajnoczi, 2014/09/18
    - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri <=
    - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Stefan Hajnoczi, 2014/09/24
    - Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency, Walid Nouri, 2014/09/25

Prev by Date: Re: [Qemu-devel] [PATCH v3 06/23] block: Make BlockBackend own its BlockDriverState
Next by Date: Re: [Qemu-devel] [PATCH] Modify qemu_opt_rename to realize renaming all items in opts
Previous by thread: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Next by thread: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Index(es):
- Date
- Thread