qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster


From: Eric Blake
Subject: Re: [Qemu-block] [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation
Date: Wed, 12 Apr 2017 13:20:20 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 04/12/2017 12:55 PM, Denis V. Lunev wrote:
> Let me rephrase a bit.
> 
> The proposal is looking very close to the following case:
> - raw sparse file
> 
> In this case all writes are very-very-very fast and from the
> guest point of view all is OK. Sequential data is really sequential.
> Though once we are starting to perform any sequential IO, we
> have real pain. Each sequential operation becomes random
> on the host file system and the IO becomes very slow. This
> will not be observed with the test, but the performance will
> degrade very soon.
> 
> This is why raw sparse files are not used in the real life.
> Hypervisor must maintain guest OS invariants and the data,
> which is nearby from the guest point of view should be kept
> nearby in host.
> 
> This is why actually that 64kb data blocks are extremely
> small :) OK. This is offtopic.

Not necessarily. Using subclusters may allow you to ramp up to larger
cluster sizes. We can also set up our allocation (and pre-allocation
schemes) so that we always reserve an entire cluster on the host at the
time we allocate the cluster, even if we only plan to write to
particular subclusters within that cluster.  In fact, 32 subclusters to
a 2M cluster results in 64k subclusters, where you are still writing at
64k data chunks but could now have guaranteed 2M locality, compared to
the current qcow2 with 64k clusters that writes in 64k data chunks but
with no locality.

Just because we don't write the entire cluster up front does not mean
that we don't have to allocate (or have a mode that allocates) the
entire cluster at the time of the first subcluster use.

> 
> One can easily recreate this case using the following simple
> test:
> - write each even 4kb page of the disk, one by one
> - write each odd 4 kb page of the disk
> - run sequential read with f.e. 1 MB data block
> 
> Normally we should still have native performance, but
> with raw sparse files and (as far as understand the
> proposal) sub-clusters we will have the host IO pattern
> exactly like random.

Only if we don't pre-allocate entire clusters at the point that we first
touch the cluster.

> 
> This seems like a big and inevitable problem of the approach
> for me. We still have the potential to improve current
> algorithms and not introduce non-compatible changes.
> 
> Sorry if this is too emotional. We have learned above in a
> very hard way.

And your experience is useful, as a way to fine-tune this proposal.  But
it doesn't mean we should entirely ditch this proposal.  I also
appreciate that you have patches in the works to reduce bottlenecks
(such as turning sub-cluster writes into 3 IOPs rather than 5, by doing
read-head, read-tail, write-cluster, instead of the current read-head,
write-head, write-body, read-tail, write-tail), but think that both
approaches are complimentary, not orthogonal.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]