[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v4 2/3] qcow2: Use QcowCache
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [PATCH v4 2/3] qcow2: Use QcowCache |
Date: |
Mon, 24 Jan 2011 16:36:50 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Thunderbird/3.0.10 |
Am 24.01.2011 16:26, schrieb Stefan Hajnoczi:
> On Mon, Jan 24, 2011 at 2:54 PM, Kevin Wolf <address@hidden> wrote:
>> [ Re-adding qemu-devel to CC ]
>>
>> Am 24.01.2011 15:34, schrieb Stefan Hajnoczi:
>>> On Thu, Jan 20, 2011 at 5:10 PM, Kevin Wolf <address@hidden> wrote:
>>>> @@ -702,17 +622,30 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState
>>>> *bs, QCowL2Meta *m)
>>>>
>>>> if (m->nb_available & (s->cluster_sectors - 1)) {
>>>> uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors -
>>>> 1);
>>>> + cow = true;
>>>> ret = copy_sectors(bs, start_sect + end, cluster_offset + (end <<
>>>> 9),
>>>> m->nb_available - end, s->cluster_sectors);
>>>> if (ret < 0)
>>>> goto err;
>>>> }
>>>>
>>>> - /* update L2 table */
>>>> + /*
>>>> + * Update L2 table.
>>>> + *
>>>> + * Before we update the L2 table to actually point to the new
>>>> cluster, we
>>>> + * need to be sure that the refcounts have been increased and COW was
>>>> + * handled.
>>>> + */
>>>> + if (cow) {
>>>> + bdrv_flush(bs->file);
>>>
>>> Just bdrv_flush(bs->file) and not a refcounts cache flush?
>>
>> Refcounts and data need not to be ordered against each other. They only
>> must both be on disk when we write the L2 table.
>
> Have I missed where refcounts actually get flushed from the cache out
> to the disk because bdrv_flush(bs->file) only syncs the file but
> doesn't write out dirty data held in cache.
The bdrv_flush isn't supposed to flush the refcounts, but the data
copied during COW (what you call pre/postfill in QED)
The refcounts are handled by the qcow2_cache_set_dependency below, so
that before writing the L2 tables we always write the refcounts first.
>>>> + }
>>>> +
>>>> + qcow2_cache_set_dependency(bs, s->l2_table_cache,
>>>> s->refcount_block_cache);
>>>> ret = get_cluster_table(bs, m->offset, &l2_table, &l2_offset,
>>>> &l2_index);
>>>> if (ret < 0) {
>>>> goto err;
>>>> }
>>>> + qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>>>
>>>> for (i = 0; i < m->nb_clusters; i++) {
>>>> /* if two concurrent writes happen to the same unallocated cluster
>>>> @@ -728,16 +661,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs,
>>>> QCowL2Meta *m)
>>>> (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
>>>> }
>>>>
>>>> - /*
>>>> - * Before we update the L2 table to actually point to the new
>>>> cluster, we
>>>> - * need to be sure that the refcounts have been increased and COW was
>>>> - * handled.
>>>> - */
>>>> - bdrv_flush(bs->file);
>>>>
>>>> - ret = write_l2_entries(bs, l2_table, l2_offset, l2_index,
>>>> m->nb_clusters);
>>>> + ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>>>> if (ret < 0) {
>>>> - qcow2_l2_cache_reset(bs);
>>>> goto err;
>>>> }
>>>>
>>>
>>> The function continues like this:
>>>
>>> /*
>>> * If this was a COW, we need to decrease the refcount of the old cluster.
>>> * Also flush bs->file to get the right order for L2 and refcount update.
>>> */
>>> if (j != 0) {
>>> bdrv_flush(bs->file);
>>> for (i = 0; i < j; i++) {
>>> qcow2_free_any_clusters(bs,
>>> be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED, 1);
>>> }
>>> }
>>>
>>> Does bdrv_flush(bs->file) "get the right order for L2 and refcount
>>> update"? We've just put an L2 table, should this be an L2 table
>>> flush?
>>
>> I agree, this looks wrong. What we need is a dependency to ensure that
>> first L2 is flushed and then the refcount block.
>> qcow2_free_any_clusters() calls update_refcount() indirectly, which
>> takes care of setting this dependency.
>>
>> So in the end I think it's just an unnecessary bdrv_flush. Makes sense?
>
> I don't understand this fully. I've noticed that the cache isn't the
> only mechanism for making changes to tables - there are functions like
> write_l2_entries() that directly write out parts of tables without
> honoring dependencies or using the dirty bit on the cache entry. I
> probably need to look at this more carefully to fully understand
> whether or not it is correct.
No, the cache should be the only mechanism that is used for accessing L2
tables and refcount blocks. write_l2_entries() is an old function that
is removed by the patch.
Direct accesses should only be left for L1 tables and refcount tables.
Kevin