qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3


From: Frediano Ziglio
Subject: Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3
Date: Tue, 28 Jun 2011 11:38:35 +0200

2011/6/27 Kevin Wolf <address@hidden>:
> This is the second draft for what I think could be added when we increase 
> qcow2's
> version number to 3. This includes points that have been made by several 
> people
> over the past few months. We're probably not going to implement this next 
> week,
> but I think it's important to get discussions started early, so here it is.
>
> Changes implemented in this RFC:
>
> - Added compatible/incompatible/auto-clear feature bits plus an optional
>  feature name table to allow useful error messages even if an older version
>  doesn't know some feature at all.
>
> - Added a dirty flag which tells that the refcount may not be accurate ("QED
>  mode"). This means that we can save writes to the refcount table with
>  cache=writethrough, but isn't really useful otherwise since Qcow2Cache.
>
> - Configurable refcount width. If you don't want to use internal snapshots,
>  make refcounts one bit and save cache space and I/O.
>
> - Added subclusters. This separate the COW size (one subcluster, I'm thinking
>  of 64k default size here) from the allocation size (one cluster, 2M). Less
>  fragmentation, less metadata, but still reasonable COW granularity.
>
>  This also allows to preallocate clusters, but none of their subclusters. You
>  can have an image that is like raw + COW metadata, and you can also
>  preallocate metadata for images with backing files.
>
> - Zero cluster flags. This allows discard even with a backing file that 
> doesn't
>  contain zeros. It is also useful for copy-on-read/image streaming, as you'll
>  want to keep sparseness without accessing the remote image for an unallocated
>  cluster all the time.
>
> - Fixed internal snapshot metadata to use 64 bit VM state size. You can't save
>  a snapshot of a VM with >= 4 GB RAM today.
>
> Possible future additions:
>
> - Add per-L2-table dirty flag to L1?
> - Add per-refcount-block full flag to refcount table?

Hi,
  thinking about image improvement I would add

- GUID for image and backing file
- relative path for backing file

This would help finding images in a distributed environment or if file
are moved, ie: gfs/nfs/ocfs mounted in different mount points, backing
used a template in a different images directory and move this
directory somewhere else. Also with GUID a possible higher level could
manage a GUID <-> file image db.

I was also think about a "backing file length" field to support
resizing but probably can be implemented with zero cluster. Assume you
have a image of 5gb, create a new image with first image as backing
one, now resize second image from 5gb to 3gb then resize it again
(after some works) to 10gb, part from 3gb to 5gb should not be read
from backing file.

Also a bit in l2 offset to say "there is no l2 table" cause all
clusters in l2 are contiguous so we avoid entirely l2. Obviously this
require an optimization step to detect or create such condition.

For check perhaps it would be helpful to save not only a flag but also
a size where data are ok (for instance already allocated and with
refcount saved correctly).

A possible optimization for refcount would be to initialize refcount
to 1 instead of 0. When clusters are allocated at end-of-file this
would not require refcount change and would be easy to check file size
to see which clusters are marked as allocated but not present.

Fields for sectors and heads to support old CHS systems ??

This mail sound quite strange to me, I thought qed would be the future
of qcow2 but I must be really wrong.

I think a big limit for current qed and qcow2 implementation is the
serialization of metadata informations (qcow2 use synchronous
operation while qed use a queue). I used bonnie++ program to test
speed and performances allocating data is about 15-20% of allocated
one. I'm working (in the few spare time I have) improving it.
VirtualBox and ESX use large clusters (1mb) to mitigate
allocation/metadata problem. Perhaps raising default cluster size
would help changing a spread idea of bad qemu i/o performance.

Regards
  Frediano Ziglio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]