qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: KVM call minutes for Sept 21


From: Anthony Liguori
Subject: [Qemu-devel] Re: KVM call minutes for Sept 21
Date: Tue, 21 Sep 2010 13:23:43 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.12) Gecko/20100826 Lightning/1.0b1 Thunderbird/3.0.7

On 09/21/2010 01:05 PM, Chris Wright wrote:
Nested VMX
- looking for forward progress and better collaboration between the
   Intel and IBM teams
- needs more review (not a new issue)
- use cases
- work todo
   - merge baseline patch
     - looks pretty good
     - review is finding mostly small things at this point
     - need some correctness verification (both review from Intel and testing)
   - need a test suite
     - test suite harness will help here
       - a few dozen nested SVM tests are there, can follow for nested VMX
   - nested EPT
   - optimize (reduce vmreads and vmwrites)
- has long term maintan

Hotplug
- command...guest may or may not respond
- guest can't be trusted to be direct part of request/response loop
- solve at QMP level
- human monitor issues (multiple successive commands to complete a
   single unplug)
   - should be a GUI interface design decision, human monitor is not a
     good design point
     - digression into GUI interface

The way this works IRL is:

1) Administrator presses a physical button. This sends an ACPI notification to the guest.

2) The guest makes a decision about how to handle APCI notification.

3) To initiate unplug, the guest disables the device and performs an operation to indicate to the PCI bus that the device is unloaded.

4) Step (3) causes an LED (usually near the button in 1) to change colors

5) Administrator then physically removes the device.

So we need at least a QMP command to perform step (1). Since (3) can occur independently of (1), it should be an async notification. device_del should only perform step (5).

A management tool needs to:

pci_unplug_request <slot>
/* wait for PCI_UNPLUGGED event */
device_del <slot>
netdev_del <backend>

Drive caching
- need to formalize the meanings in terms of data integrity guarantees
- guest write cache (does it directly reflect the host write cache?)
   - live migration, underlying block dev changes, so need to decouple the two
- O_DIRECT + O_DSYNC
   - O_DSYNC needed based on whether disk cache is available
   - also issues with sparse files (e.g. O_DIRECT to unallocated extent)
   - how to manage w/out needing to flush every write, slow
- perhaps start with O_DIRECT on raw, non-sparse files only?
- backend needs to open backing store matching to guests disk cache state
- O_DIRECT itself has inconsistent integrity guarantees
   - works well with fully allocated file, depedent on disk cache disable
     (or fs specific flushing)
- filesystem specific warnings (ext4 w/ barriers on, brtfs)
- need to be able to open w/ O_DSYNC depending on guets's write cache mode
- make write cache visible to guest (need a knob for this)
- qemu default is cache=writethrough, do we need to revisit that?
- just present user with option whether or not to use host page cache
- allow guest OS to choose disk write cache setting
   - set up host backend accordingly
- be nice preserve write cache settings over boot (outgrowing cmos storage)
- maybe some host fs-level optimization possible
   - e.g. O_DSYNC to allocated O_DIRECT extent becomes no-op
- conclusion
   - one direct user tunable, "use host page cache or not"
   - one guest OS tunable, "enable disk cache"

IOW, a qdev 'write-cache=on|off' property and a blockdev 'direct=on|off' property. For completeness, a blockdev 'unsafe=on|off' property.

Open flags are:

write-cache=on, direct=on    O_DIRECT
write-cache=off, direct=on    O_DIRECT | O_DSYNC
write-cache=on, direct=off    0
write-cache=off, direct=off    O_DSYNC

It's still unclear what our default mode will be.

The problem is, O_DSYNC has terrible performance on ext4 when barrier=1.

write-cache=on,direct=off is a bad default because if you do a simple performance test, you'll get better than native and that upsets people.

write-cache=off,direct=off is a bad default because ext4's default config sucks with this.

likewise, write-cache=off, direct=on is a bad default for the same reason.

Regards,

Anthonny Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to address@hidden
More majordomo info at  http://vger.kernel.org/majordomo-info.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]