qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format
Date: Wed, 14 Aug 2013 11:29:12 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Aug 13, 2013 at 07:03:56PM +0200, Kaveh Razavi wrote:
> Using copy-on-write images with the base image stored remotely is common
> practice in data centers. This saves significant network traffic by
> avoiding the transfer of the complete base image. However, the data
> blocks needed for a VM boot still need to be transfered to the node that
> runs the VM. On slower networks, this will create a bottleneck when
> booting many VMs simultaneously from a single VM image. Also,
> simultaneously booting VMs from more than one VM image creates a
> bottleneck at the storage device of the base image, if the storage
> device does not fair well with the random access pattern that happens
> during booting.
> 
> This patch introduces a block-level caching mechanism by introducing a
> copy-on-read image that supports quota and goes in between the base
> image and copy-on-write image. This cache image can either be stored on
> the nodes that run VMs or on a storage device that can handle random
> access well (e.g. memory, SSD, etc.). This cache image is effective
> since usually only a very small part of the image is necessary for
> booting a VM. We measured 100MB to be enough for a default CentOS and
> Debian installations.
> 
> A cache image with a quota of 100MB can be created using these commands:
> 
> $ qemu-img create -f qcow2 -o
> cache_img_quota=104857600,backing_file=/path/to/base /path/to/cache
> $ qemu-img create -f qcow2 -o backing_file=/path/to/cache /path/to/cow
> 
> The first time a VM boots from the copy-on-write image, the cache gets
> warm. Subsequent boots do not need to read from the base image.

100 MB is small enough for RAM.  Did you try enabling the host kernel
page cache for the backing file?  That way all guests running on this
host share a single RAM-cached version of the backing file.

The other existing solution is to use the image streaming feature, which
was designed to speed up deployment of image files over the network.  It
copies the contents of the image from a remote server onto the host
while allowing immediate random access from the guest.  This isn't a
cache, this is a full copy of the image.

I share an idea of how to turn this into a cache in a second, but first
how to deploy this safely.  Since multiple QEMU processes can share a
backing file and the cache must not suffer from corruptions due to
races, you can use one qemu-nbd per backing image.  The QEMU processes
connect to the local read-only qemu-nbd server.

If you want a cache you could enable copy-on-read without the image
streaming feature (block_stream command) and evict old data using
discard commands.  No qcow2 image format changes are necessary to do
this.

> @@ -730,6 +751,31 @@ static coroutine_fn int qcow2_co_readv(BlockDriverState 
> *bs, int64_t sector_num,
>                      if (ret < 0) {
>                          goto fail;
>                      }
> +                    /* do copy-on-read if this is a cache image */
> +                    if (bs->is_cache_img && !s->is_cache_full && 
> +                            !s->is_writing_on_cache)
> +                    {
> +                        qemu_co_mutex_unlock(&s->lock);
> +                        s->is_writing_on_cache = true;
> +                        ret = bdrv_co_writev(bs,
> +                                             sector_num,
> +                                             n1,
> +                                             &hd_qiov);
> +                        s->is_writing_on_cache = false;
> +                        qemu_co_mutex_lock(&s->lock);
> +                        if (ret < 0) {
> +                            if (ret == (-ENOSPC))
> +                            {
> +                                s->is_cache_full = true;
> +                            }
> +                            else {
> +                                /* error is other than cache space */
> +                                fprintf(stderr, "Cache write error (%d)\n", 
> +                                        ret);
> +                                goto fail;
> +                            }
> +                        }
> +                    }

This is unsafe since other QEMU processes on the host are not
synchronizing with each other.  The image file will be corrupted.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]