qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH RFC 0/1] Allow storing the qcow2 L2 cache in dis


From: Max Reitz
Subject: Re: [Qemu-block] [PATCH RFC 0/1] Allow storing the qcow2 L2 cache in disk
Date: Fri, 9 Dec 2016 15:21:08 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1

On 09.12.2016 14:47, Alberto Garcia wrote:
> Hi all,
> 
> as we all know, one of the main things that can make the qcow2 format
> slow is the need to load entries from the L2 table in order to map a
> guest offset (on the virtual disk) to a host offset (on the qcow2
> image).
> 
> We have an L2 cache to deal with this, and as long as the cache is big
> enough then the peformance is comparable to that of a raw image.
> 
> For large qcow2 images the amount of RAM we need in order to cache all
> L2 tables can be big (128MB per TB of disk image if we're using the
> default cluster size of 64KB). In order to solve this problem we have
> a setting that allows the user to clean unused cache entries after a
> certain interval of time. This works fine most of the time, although
> we can still have peaks of RAM usage if there's a lot of I/O going on
> in one or more VMs.
> 
> In some scenarios, however, there's a different alternative: if the
> qcow2 image is stored in a slow backend (eg. HDD), we could save
> memory by putting the L2 cache in a faster one (SSD) instead of in
> RAM.
> 
> I have been making some tests with exactly that scenario and the
> results look good: storing the cache in disk gives roughly the same
> performance as storing it in memory.
> 
> |---------------------+------------+------+------------+--------|
> |                     | Random 4k reads   | Sequential 4k reads |
> |                     | Throughput | IOPS | Throughput |  IOPS  |
> |---------------------+------------+------+------------+--------|
> | Cache in memory/SSD | 406 KB/s   |   99 | 84 MB/s    |  21000 |
> | Default cache (1MB) | 200 KB/s   |   60 | 83 MB/s    |  21000 |
> | No cache            | 200 KB/s   |   49 | 56 MB/s    |  14000 |
> |---------------------+------------+------+------------+--------|
> 
> I'm including the patch that I used to get these results. This is the
> simplest approach that I could think of.
> 
> Opinions, questions?

Well, from a full design standpoint, it doesn't make a lot of sense to me:

We have a two-level on-disk structure for cluster mapping so as to not
waste memory for unused areas and so that we don't need to keep one
large continuous chunk of metadata. Accessing the disk is slow, so we
also have an in-memory cache which is just a single level fully
associative cache replicating the same data (but just a part of it).

Now you want to replicate all of it and store it on disk. My mind tells
me that is duplicate data: We already have all of the metadata elsewhere
on disk, namely in the qcow2 file, and even better, it is not stored in
a fully associative structure there but directly mapped, making finding
the correct entry much quicker.

Therefore, this is not a good idea. The existing structures already
exist and are just better.

However, the thing is that the existing structures also only exist in
the original qcow2 file and cannot be just placed anywhere else, as
opposed to our cache. In order to solve this, we would need to
(incompatibly) modify the qcow2 format to allow storing data
independently from metadata. I think this would be certainly doable, but
the question is whether it is worth the effort.

I'm not sure, maybe it actually is worth the effort. Your patch is nice
and simple and certainly improves things now. But the question is
whether we can't do better than to look up all cluster mappings in a
fully associative table.

Maybe we can at least make the cache directly mapped if it is supposed
to cover the whole image? That is, we would basically just load all of
the L2 tables into memory and bypass the existing cache.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]