qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [BUG] qemu-1.1.2 [FIXED-BY] qcow2: Fix avail_sectors in


From: Kevin Wolf
Subject: Re: [Qemu-devel] [BUG] qemu-1.1.2 [FIXED-BY] qcow2: Fix avail_sectors in cluster allocation code
Date: Tue, 22 Jan 2013 15:14:41 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0

Am 22.01.2013 14:17, schrieb Laszlo Ersek:
> As I understand it, the question is not whether b7ab0fea works or not
> (it does), the question is whether *without* b7ab0fea there is a real
> problem (user visible bug).

Correct. Or actually not whether, but why.

> Commit b7ab0fea decreased "avail_sectors" by
> 
>     keep_clusters << (s->cluster_bits - BDRV_SECTOR_BITS)
> 
> which should make a difference for "m->nb_available" in two cases:
> 
> (a)
> 
>     avail_sectors_patched   <
>     requested_sectors       <=
>     avail_sectors_unpatched
> 
> (ie. the decrease in "avail_sectors" flipped MIN to prefer it over
> requested_sectors, lowering m->nb_available)
> 
> (b)
> 
>     avail_sectors_patched   <
>     avail_sectors_unpatched <=
>     requested_sectors
> 
> (ie. MIN had already preferred "avail_sectors", and now it went lower).
> 
> 
> It seems that
> 
>   1 << (s->cluster_bits - BDRV_SECTOR_BITS) == s->cluster_sectors    [1]
> 
> Therefore both "avail_sectors_patched" and "avail_sectors_unpatched" are
> an integral multiple of "s->cluster_sectors". Whenever MIN() selects
> either of them, the condition in qcow2_alloc_cluster_link_l2() will
> evaluate to false.
> 
> 
> In case (b) there seems to be no difference between patched & unpatched
> in this regard: MIN() always selects an integral multiple, and so
> copy_sectors() is never invoked.
> 
> In case (a), the patched version skips copy_sectors(), while the
> unpatched version invokes it.
> 
> I'm not sure if case (a) is possible at all -- let's expand it:
> 
>     avail_sectors_patched   <
>     requested_sectors       <=
>     avail_sectors_unpatched
> 
> Substitute the assigned values:
> 
>                   nb_cluster  << (s->cluster_bits - BDRV_SECTOR_BITS) <
>     n_end - keep_clusters * s->cluster_sectors                        <=
> (keep_clusters + nb_clusters) << (s->cluster_bits - BDRV_SECTOR_BITS)
> 
> Using [1], replace the shifts with multiplication:
> 
>                     nb_cluster  * s->cluster_sectors <
>           n_end - keep_clusters * s->cluster_sectors <=
>   (keep_clusters + nb_clusters) * s->cluster_sectors
> 
> Distribute the addition:
> 
>                     nb_cluster  * s->cluster_sectors   <
>           n_end - keep_clusters * s->cluster_sectors   <=
>                   keep_clusters * s->cluster_sectors +
>                     nb_clusters * s->cluster_sectors
> 
> N :=   nb_cluster  * s->cluster_sectors [bytes]
> K := keep_clusters * s->cluster_sectors [bytes]
> 
>                   N  <  n_end - K  <=    K + N
> 
> Add K
> 
>               K + N  <  n_end      <=  2*K + N                       [2]

Thank you so much, Laszlo, you are great!

I more or less completely failed to see case (a). With your excellent
analysis I can indeed trigger a copy on write when it shouldn't happen.
The series of requests to get into this state is somewhat tricky as it
requires a specific physical ordering of clusters in the image file:

Host cluster h:     Guest cluster g is mapped to here
Host cluster h + 1: Free, can be allocated for not yet mapped g + 1
Host cluster h + 2: Guest cluster g + 2 is mapped to here

I can get this with this command on a fresh qcow2 image (64k cluster size):

./qemu-io
  -c 'write -P 0x66 0 64k'
  -c 'write 64k 64k'
  -c 'write -P 0x88 128k 64k'
  -c 'discard 64k 64k'
  -c 'write -P 0x77 0 160k'
  /tmp/test.qcow2

Now I get an additional COW for the area from 96k to 128k. However, this
alone doesn't corrupt an image - a copy on write by itself is harmless
because it only rewrites the data that is already there.

The two things I'm going to check next are:

a) What happens with concurrent requests now, are requests for the
   same area still correctly serialised?

b) Can I modify the parameters so that copy_sectors() is working
   with values it wasn't designed for so that the data is copied to
   a wrong place or something like that. m->nb_available is used as
   an argument to it, so it certainly seems realistic.

> I have no clue about qcow2, but *each one* of K and N seem to be a
> function of *both* n_end and the current "qcow2 disk state" (ie. sectors
> being allocated vs. just referenced). IOW, I suspect that case (a) is
> possible for some (disk state, n_end) pairs. And case (a) -- if indeed
> possible -- seems to contradict the commit message:
> 
>   allocation. A COW occurs only if the request wasn't cluster aligned,
>                                        ^^^^^^^    ^^^^^^^^^^^^^^^^^^^
>   which in turn would imply that requested_sectors was less than
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   avail_sectors (both in the original and in the fixed version). In this
>   ^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^^^^^^^
> 
> In case (a) -- expanded as [2] --, "requested_sectors" is greater than
> "avail_sectors_patched", and less than or equal to
> "avail_sectors_unpatched".

Yes, I agree. The commit message is definitely wrong and I understand
now why, so that was a great step forward. There's still a missing piece
about how it creates a user-visible bug, but hopefully it won't be that
hard to find.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]