qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 2/3] qcow2: Use QcowCache


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v4 2/3] qcow2: Use QcowCache
Date: Mon, 24 Jan 2011 15:26:48 +0000

On Mon, Jan 24, 2011 at 2:54 PM, Kevin Wolf <address@hidden> wrote:
> [ Re-adding qemu-devel to CC ]
>
> Am 24.01.2011 15:34, schrieb Stefan Hajnoczi:
>> On Thu, Jan 20, 2011 at 5:10 PM, Kevin Wolf <address@hidden> wrote:
>>> @@ -702,17 +622,30 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
>>> QCowL2Meta *m)
>>>
>>>     if (m->nb_available & (s->cluster_sectors - 1)) {
>>>         uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - 
>>> 1);
>>> +        cow = true;
>>>         ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << 
>>> 9),
>>>                 m->nb_available - end, s->cluster_sectors);
>>>         if (ret < 0)
>>>             goto err;
>>>     }
>>>
>>> -    /* update L2 table */
>>> +    /*
>>> +     * Update L2 table.
>>> +     *
>>> +     * Before we update the L2 table to actually point to the new cluster, 
>>> we
>>> +     * need to be sure that the refcounts have been increased and COW was
>>> +     * handled.
>>> +     */
>>> +    if (cow) {
>>> +        bdrv_flush(bs->file);
>>
>> Just bdrv_flush(bs->file) and not a refcounts cache flush?
>
> Refcounts and data need not to be ordered against each other. They only
> must both be on disk when we write the L2 table.

Have I missed where refcounts actually get flushed from the cache out
to the disk because bdrv_flush(bs->file) only syncs the file but
doesn't write out dirty data held in cache.

>>> +    }
>>> +
>>> +    qcow2_cache_set_dependency(bs, s->l2_table_cache, 
>>> s->refcount_block_cache);
>>>     ret = get_cluster_table(bs, m->offset, &l2_table, &l2_offset, 
>>> &l2_index);
>>>     if (ret < 0) {
>>>         goto err;
>>>     }
>>> +    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>>
>>>     for (i = 0; i < m->nb_clusters; i++) {
>>>         /* if two concurrent writes happen to the same unallocated cluster
>>> @@ -728,16 +661,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
>>> QCowL2Meta *m)
>>>                     (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
>>>      }
>>>
>>> -    /*
>>> -     * Before we update the L2 table to actually point to the new cluster, 
>>> we
>>> -     * need to be sure that the refcounts have been increased and COW was
>>> -     * handled.
>>> -     */
>>> -    bdrv_flush(bs->file);
>>>
>>> -    ret = write_l2_entries(bs, l2_table, l2_offset, l2_index, 
>>> m->nb_clusters);
>>> +    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>>>     if (ret < 0) {
>>> -        qcow2_l2_cache_reset(bs);
>>>         goto err;
>>>     }
>>>
>>
>> The function continues like this:
>>
>> /*
>>  * If this was a COW, we need to decrease the refcount of the old cluster.
>>  * Also flush bs->file to get the right order for L2 and refcount update.
>>  */
>> if (j != 0) {
>>     bdrv_flush(bs->file);
>>     for (i = 0; i < j; i++) {
>>         qcow2_free_any_clusters(bs,
>>             be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED, 1);
>>     }
>> }
>>
>> Does bdrv_flush(bs->file) "get the right order for L2 and refcount
>> update"?  We've just put an L2 table, should this be an L2 table
>> flush?
>
> I agree, this looks wrong. What we need is a dependency to ensure that
> first L2 is flushed and then the refcount block.
> qcow2_free_any_clusters() calls update_refcount() indirectly, which
> takes care of setting this dependency.
>
> So in the end I think it's just an unnecessary bdrv_flush. Makes sense?

I don't understand this fully.  I've noticed that the cache isn't the
only mechanism for making changes to tables - there are functions like
write_l2_entries() that directly write out parts of tables without
honoring dependencies or using the dirty bit on the cache entry.  I
probably need to look at this more carefully to fully understand
whether or not it is correct.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]