qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm


From: MORITA Kazutaka
Subject: Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
Date: Tue, 25 May 2010 22:26:12 +0900
User-agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (Gojō) APEL/10.7 Emacs/22.2 (x86_64-pc-linux-gnu) MULE/5.0 (SAKAKI)

At Mon, 24 May 2010 14:16:32 -0500,
Anthony Liguori wrote:
> 
> On 05/24/2010 06:56 AM, Avi Kivity wrote:
> > On 05/24/2010 02:42 PM, MORITA Kazutaka wrote:
> >>
> >>> The server would be local and talk over a unix domain socket, perhaps
> >>> anonymous.
> >>>
> >>> nbd has other issues though, such as requiring a copy and no support 
> >>> for
> >>> metadata operations such as snapshot and file size extension.
> >>>
> >> Sorry, my explanation was unclear.  I'm not sure how running servers
> >> on localhost can solve the problem.
> >
> > The local server can convert from the local (nbd) protocol to the 
> > remote (sheepdog, ceph) protocol.
> >
> >> What I wanted to say was that we cannot specify the image of VM. With
> >> nbd protocol, command line arguments are as follows:
> >>
> >>   $ qemu nbd:hostname:port
> >>
> >> As this syntax shows, with nbd protocol the client cannot pass the VM
> >> image name to the server.
> >
> > We would extend it to allow it to connect to a unix domain socket:
> >
> >   qemu nbd:unix:/path/to/socket
> 
> nbd is a no-go because it only supports a single, synchronous I/O 
> operation at a time and has no mechanism for extensibility.
> 
> If we go this route, I think two options are worth considering.  The 
> first would be a purely socket based approach where we just accepted the 
> extra copy.
> 
> The other potential approach would be shared memory based.  We export 
> all guest ram as shared memory along with a small bounce buffer pool.  
> We would then use a ring queue (potentially even using virtio-blk) and 
> an eventfd for notification.
> 

The shared memory approach assumes that there is a local server who
can talk with the storage system.  But Ceph doesn't require the local
server, and Sheepdog would be extended to support VMs running outside
the storage system.  We could run a local daemon who can only work as
proxy, but I don't think it looks a clean approach.  So I think a
socket based approach is the right way to go.

BTW, is it required to design a common interface?  The way Sheepdog
replicates data is different from Ceph, so I think it is not possible
to define a common protocol as Christian says.

Regards,

Kazutaka

> > The server at the other end would associate the socket with a filename 
> > and forward it to the server using the remote protocol.
> >
> > However, I don't think nbd would be a good protocol.  My preference 
> > would be for a plugin API, or for a new local protocol that uses 
> > splice() to avoid copies.
> 
> I think a good shared memory implementation would be preferable to 
> plugins.  I think it's worth attempting to do a plugin interface for the 
> block layer but I strongly suspect it would not be sufficient.
> 
> I would not want to see plugins that interacted with BlockDriverState 
> directly, for instance.  We change it far too often.  Our main loop 
> functions are also not terribly stable so I'm not sure how we would 
> handle that (unless we forced all block plugins to be in a separate thread).
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]