[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [BUG] qemu-1.1.2 [FIXED-BY] qcow2: Fix avail_sectors in
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [BUG] qemu-1.1.2 [FIXED-BY] qcow2: Fix avail_sectors in cluster allocation code |
Date: |
Tue, 22 Jan 2013 15:14:41 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0 |
Am 22.01.2013 14:17, schrieb Laszlo Ersek:
> As I understand it, the question is not whether b7ab0fea works or not
> (it does), the question is whether *without* b7ab0fea there is a real
> problem (user visible bug).
Correct. Or actually not whether, but why.
> Commit b7ab0fea decreased "avail_sectors" by
>
> keep_clusters << (s->cluster_bits - BDRV_SECTOR_BITS)
>
> which should make a difference for "m->nb_available" in two cases:
>
> (a)
>
> avail_sectors_patched <
> requested_sectors <=
> avail_sectors_unpatched
>
> (ie. the decrease in "avail_sectors" flipped MIN to prefer it over
> requested_sectors, lowering m->nb_available)
>
> (b)
>
> avail_sectors_patched <
> avail_sectors_unpatched <=
> requested_sectors
>
> (ie. MIN had already preferred "avail_sectors", and now it went lower).
>
>
> It seems that
>
> 1 << (s->cluster_bits - BDRV_SECTOR_BITS) == s->cluster_sectors [1]
>
> Therefore both "avail_sectors_patched" and "avail_sectors_unpatched" are
> an integral multiple of "s->cluster_sectors". Whenever MIN() selects
> either of them, the condition in qcow2_alloc_cluster_link_l2() will
> evaluate to false.
>
>
> In case (b) there seems to be no difference between patched & unpatched
> in this regard: MIN() always selects an integral multiple, and so
> copy_sectors() is never invoked.
>
> In case (a), the patched version skips copy_sectors(), while the
> unpatched version invokes it.
>
> I'm not sure if case (a) is possible at all -- let's expand it:
>
> avail_sectors_patched <
> requested_sectors <=
> avail_sectors_unpatched
>
> Substitute the assigned values:
>
> nb_cluster << (s->cluster_bits - BDRV_SECTOR_BITS) <
> n_end - keep_clusters * s->cluster_sectors <=
> (keep_clusters + nb_clusters) << (s->cluster_bits - BDRV_SECTOR_BITS)
>
> Using [1], replace the shifts with multiplication:
>
> nb_cluster * s->cluster_sectors <
> n_end - keep_clusters * s->cluster_sectors <=
> (keep_clusters + nb_clusters) * s->cluster_sectors
>
> Distribute the addition:
>
> nb_cluster * s->cluster_sectors <
> n_end - keep_clusters * s->cluster_sectors <=
> keep_clusters * s->cluster_sectors +
> nb_clusters * s->cluster_sectors
>
> N := nb_cluster * s->cluster_sectors [bytes]
> K := keep_clusters * s->cluster_sectors [bytes]
>
> N < n_end - K <= K + N
>
> Add K
>
> K + N < n_end <= 2*K + N [2]
Thank you so much, Laszlo, you are great!
I more or less completely failed to see case (a). With your excellent
analysis I can indeed trigger a copy on write when it shouldn't happen.
The series of requests to get into this state is somewhat tricky as it
requires a specific physical ordering of clusters in the image file:
Host cluster h: Guest cluster g is mapped to here
Host cluster h + 1: Free, can be allocated for not yet mapped g + 1
Host cluster h + 2: Guest cluster g + 2 is mapped to here
I can get this with this command on a fresh qcow2 image (64k cluster size):
./qemu-io
-c 'write -P 0x66 0 64k'
-c 'write 64k 64k'
-c 'write -P 0x88 128k 64k'
-c 'discard 64k 64k'
-c 'write -P 0x77 0 160k'
/tmp/test.qcow2
Now I get an additional COW for the area from 96k to 128k. However, this
alone doesn't corrupt an image - a copy on write by itself is harmless
because it only rewrites the data that is already there.
The two things I'm going to check next are:
a) What happens with concurrent requests now, are requests for the
same area still correctly serialised?
b) Can I modify the parameters so that copy_sectors() is working
with values it wasn't designed for so that the data is copied to
a wrong place or something like that. m->nb_available is used as
an argument to it, so it certainly seems realistic.
> I have no clue about qcow2, but *each one* of K and N seem to be a
> function of *both* n_end and the current "qcow2 disk state" (ie. sectors
> being allocated vs. just referenced). IOW, I suspect that case (a) is
> possible for some (disk state, n_end) pairs. And case (a) -- if indeed
> possible -- seems to contradict the commit message:
>
> allocation. A COW occurs only if the request wasn't cluster aligned,
> ^^^^^^^ ^^^^^^^^^^^^^^^^^^^
> which in turn would imply that requested_sectors was less than
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> avail_sectors (both in the original and in the fixed version). In this
> ^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
>
> In case (a) -- expanded as [2] --, "requested_sectors" is greater than
> "avail_sectors_patched", and less than or equal to
> "avail_sectors_unpatched".
Yes, I agree. The commit message is definitely wrong and I understand
now why, so that was a great step forward. There's still a missing piece
about how it creates a user-visible bug, but hopefully it won't be that
hard to find.
Kevin