|
From: | Anthony Liguori |
Subject: | Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd |
Date: | Tue, 01 May 2012 16:52:05 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 |
On 05/01/2012 03:56 PM, Eric Blake wrote:
On 05/01/2012 02:25 PM, Anthony Liguori wrote:Thanks for sending this out Stefan.Indeed.This series adds the -open-hook-fd command-line option. Whenever QEMU needs to open an image file it sends a request over the given UNIX domain socket. The response includes the file descriptor or an errno on failure. Please see the patches for details on the protocol. The -open-hook-fd approach allows QEMU to support file descriptor passing without changing -drive. It also supports snapshot_blkdev and other commands that re-open image files. Anthony Liguori<address@hidden> wrote most of these patches. I added a demo -open-hook-fd server and added some small fixes. Since Anthony is traveling right now I'm sending the RFC for discussion.What I like about this approach is that it's useful outside the block layer and is conceptionally simple from a QEMU PoV. We simply delegate open() to libvirt and let libvirt enforce whatever rules it wants. This is not meant to be an alternative to blockdev, but even with blockdev, I think we still want to use a mechanism like this even with blockdev.The overall series looks like it would be rather interesting. What sort of timing restrictions are there? For example, the proposed 'drive-reopen' command (probably now delegated to qemu 1.2) would mean that qemu would be calling back into libvirt in order to do the reopen. If libvirt takes its time in passing back an open fd, is it going to starve qemu from answering unrelated monitor commands in the meantime?
s/libvirt/kernel/g and your concerns are equally valid.Doing open() should never be done in a path that could block things. There's always the possibility that we're on top of NFS and the open could timeout.
For something like drive_reopen, we should use an asynchronous open() that dispatched the open() in the posix-aio thread pool.
That's part of what's nice about this approach, we could still call file_open() in the posix-aio thread pool...
I definitely want to make sure we avoid deadlock where libvirt is waiting on a monitor command, but the monitor command is waiting on libvirt to pass an fd. Is this also an opportunity to request whether a particular fd must be seekable vs. acceptable as a one-pass read or write, perhaps by whether the command is 1 (seekable open) or 2 (one-pass open)?
I'm not really sure where the distinction lies...I want the RPC to behave exactly like open(). So if we're assuming that open() of a /dev/ file returns something that is ioctl()'able, then that's what libvirt should return.
If we want to sort of do fd-transformation where a special protocol is used for things like ioctl, that's fine, but it ought to be a different mechanism (that's probably not nearly as generic).
For example, migration is one-pass (and therefore libvirt passes a pipe which is hooked up to a helper app that uses O_DIRECT), while block devices must be seekable.
But migration doesn't involve doing an open(). This is not a replacement for fd passing. This is a replacement for open() to make up for the facts that (1) some management tools like libvirt cannot isolate guests with DAC and (2) SELinux cannot be used to isolate guests across all file systems.
I would really prefer that the kernel fix this problem for us, but from what I'm told, the problem lies in the NFS standards committee so short of forking the NFS protocol, there isn't much that the kernel can do.
Regards, Anthony Liguori
[Prev in Thread] | Current Thread | [Next in Thread] |