qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 14/47] stream: Deal with filters


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH v7 14/47] stream: Deal with filters
Date: Fri, 7 Aug 2020 13:29:16 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0

16.07.2020 17:59, Max Reitz wrote:
On 10.07.20 19:41, Andrey Shinkevich wrote:
On 10.07.2020 18:24, Max Reitz wrote:
On 09.07.20 16:52, Andrey Shinkevich wrote:
On 25.06.2020 18:21, Max Reitz wrote:
Because of the (not so recent anymore) changes that make the stream job
independent of the base node and instead track the node above it, we
have to split that "bottom" node into two cases: The bottom COW node,
and the node directly above the base node (which may be an R/W filter
or the bottom COW node).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
    qapi/block-core.json |  4 +++
    block/stream.c       | 63
++++++++++++++++++++++++++++++++------------
    blockdev.c           |  4 ++-
    3 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index b20332e592..df87855429 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2486,6 +2486,10 @@
    # On successful completion the image file is updated to drop the
backing file
    # and the BLOCK_JOB_COMPLETED event is emitted.
    #
+# In case @device is a filter node, block-stream modifies the first
non-filter
+# overlay node below it to point to base's backing node (or NULL if
@base was
+# not specified) instead of modifying @device itself.
+#
    # @job-id: identifier for the newly-created block job. If
    #          omitted, the device name will be used. (Since 2.7)
    #
diff --git a/block/stream.c b/block/stream.c
index aa2e7af98e..b9c1141656 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -31,7 +31,8 @@ enum {
      typedef struct StreamBlockJob {
        BlockJob common;
-    BlockDriverState *bottom;
+    BlockDriverState *base_overlay; /* COW overlay (stream from
this) */
+    BlockDriverState *above_base;   /* Node directly above the base */
Keeping the base_overlay is enough to complete the stream job.
Depends on the definition.  If we decide it isn’t enough, then it isn’t
enough.

The above_base may disappear during the job and we can't rely on it.
In this version of this series, it may not, because the chain is frozen.
   So the above_base cannot disappear.

Once we insert a filter above the top bs of the stream job, the parallel
jobs in

the iotests #030 will fail with 'frozen link error'. It is because of the

independent parallel stream or commit jobs that insert/remove their filters

asynchroniously.

I’m not sure whether that’s a problem with this series specifically.

We can discuss whether we should allow it to disappear, but I think not.

The problem is, we need something to set as the backing file after
streaming.  How do we figure out what that should be?  My proposal is we
keep above_base and use its immediate child.

We can do the same with the base_overlay.

If the backing node turns out to be a filter, the proper backing child will

be set after the filter is removed. So, we shouldn't care.

And what if the user manually added some filter above the base (i.e.
below base_overlay) that they want to keep after the job?


It's automatically kept, if we use base_overlay->backing->bs as final backing 
node.

You mean, that they want it to be dropped?


so, assuming the following:

top -(backing)-> manually-inserted-filter -(file)-> base

and user do stream with base=base, and expects filter to be removed by stream 
job?

Hmm, yes, such use-case is broken with our proposed way...

====

Let me now clarify the problem we'll have with your way.

When stream don't have any filter, we can easily imagine two parallel stream 
jobs:

top -(backing)-> mid1 -(backing)-> mid2 -(backing)-> base

stream1: top=top, base=mid2
stream2: top=mid2, base=NULL

final picture is obvious:

top (merged with mid1) -(backing)-> mid2 (merged with base)

But we want stream job has own filter, like mirror. So the picture becomes more 
complex.

Assume stream2 starts first.

top -(backing)-> mid1 -(backing)-> stream2-filter -(backing)-> mid2 
-(backing)-> base

Now, when we run stream1, with your solution, stream1 will freeze stream2-filter
(wrong thing, stream2 will fail to remove it if it finished first), and stream1 
will
remove stream2-filter on finish (which is wrong as well, stream2 is not 
prepared to
removing of its filter)..

But, with our proposed way (freeze only chain up to base_overlay inclusively, 
and use backing(base_overlay) as final backing), all will work as expected, and 
two parallel jobs will work..

====

So, these are two mutually exclusive cases.. I vote for freezing up to 
base_overlay, and use backing(base_overlay) as final backing, because:

1. I can't imaging other way to fix the case with parallel streams with filters 
(it's not a problem of current master, but we have pending series which will 
introduce stream job filter, and the problem will appear and even break iotest 
30)

2. I don't think that removing filters above base node by stream job is so 
important case to break parallel stream jobs in future:

 - Stream job is not intended to remove filters, but to stream data. Filters 
between base_overlay and base don't contain any data and unrelated to stream 
process
 - I think, that filters are "more related" to their children than to their 
parents. So, removing filters related to base node, when we just remove all 
data-containing nodes between top and base (and are not going to remove base node) is at 
least questionable. On the contrary, removing all intermediate data containing nodes 
_together_ with their filters is absolutely correct thing to do.

Next, with your way, what about filters, inserted above base during stream job? 
They will be between above_base and base, and will not be removed. So with your 
way, filters above base, existing before job start will be frozen during the 
job and removed after it, but filters appended above base during the job will 
be untouched. With our way, just all base node related filters are untouched by 
the job. It seems simpler definition for me and simpler to document.


If we don’t keep above_base, then we’re basically left guessing as to
what should be the backing file after the stream job.

        BlockdevOnError on_error;
        char *backing_file_str;
        bool bs_read_only;
@@ -53,7 +54,7 @@ static void stream_abort(Job *job)
          if (s->chain_frozen) {
            BlockJob *bjob = &s->common;
-        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->bottom);
+        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
        }
    }
    @@ -62,14 +63,15 @@ static int stream_prepare(Job *job)
        StreamBlockJob *s = container_of(job, StreamBlockJob,
common.job);
        BlockJob *bjob = &s->common;
        BlockDriverState *bs = blk_bs(bjob->blk);
-    BlockDriverState *base = backing_bs(s->bottom);
+    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
+    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
The initial base node may be a top node for a concurrent commit job and

may disappear.
Then it would just be replaced by another node, though, so above_base
keeps a child.  The @base here is not necessarily the initial @base, and
that’s intentional.

Not really. In my example, above_base becomes a dangling

pointer because after the commit job finishes, its filter that should
belong to the

commit job frozen chain will be deleted. If we freeze the link to the
above_base

for this job, the iotests #30 will not pass.

So it doesn’t become a dangling pointer, because it’s frozen.

030 passes after this series, so I’m not sure whether I can consider
that problem part of this series.

I think if adding a filter node becomes a problem, we have to consider
relaxing the restrictions when we do that, not now.

base = bdrv_filter_or_cow_bs(s->base_overlay) is more reliable.
But also wrong.  The point of keeping above_base around is to get its
child here to use that child as the new backing child of the top node.

        Error *local_err = NULL;
        int ret = 0;
    -    bdrv_unfreeze_backing_chain(bs, s->bottom);
+    bdrv_unfreeze_backing_chain(bs, s->above_base);
        s->chain_frozen = false;
    -    if (bs->backing) {
+    if (bdrv_cow_child(unfiltered_bs)) {
            const char *base_id = NULL, *base_fmt = NULL;
            if (base) {
                base_id = s->backing_file_str;
@@ -77,8 +79,8 @@ static int stream_prepare(Job *job)
                    base_fmt = base->drv->format_name;
                }
            }
-        bdrv_set_backing_hd(bs, base, &local_err);
-        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
+        bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
+        ret = bdrv_change_backing_file(unfiltered_bs, base_id,
base_fmt);
            if (local_err) {
                error_report_err(local_err);
                return -EPERM;
@@ -109,14 +111,15 @@ static int coroutine_fn stream_run(Job *job,
Error **errp)
        StreamBlockJob *s = container_of(job, StreamBlockJob,
common.job);
        BlockBackend *blk = s->common.blk;
        BlockDriverState *bs = blk_bs(blk);
-    bool enable_cor = !backing_bs(s->bottom);
+    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
+    bool enable_cor = !bdrv_cow_child(s->base_overlay);
        int64_t len;
        int64_t offset = 0;
        uint64_t delay_ns = 0;
        int error = 0;
        int64_t n = 0; /* bytes */
    -    if (bs == s->bottom) {
+    if (unfiltered_bs == s->base_overlay) {
            /* Nothing to stream */
            return 0;
        }
@@ -150,13 +153,14 @@ static int coroutine_fn stream_run(Job *job,
Error **errp)
              copy = false;
    -        ret = bdrv_is_allocated(bs, offset, STREAM_CHUNK, &n);
+        ret = bdrv_is_allocated(unfiltered_bs, offset, STREAM_CHUNK,
&n);
            if (ret == 1) {
                /* Allocated in the top, no need to copy.  */
            } else if (ret >= 0) {
                /* Copy if allocated in the intermediate images.  Limit
to the
                 * known-unallocated area [offset,
offset+n*BDRV_SECTOR_SIZE).  */
-            ret = bdrv_is_allocated_above(backing_bs(bs), s->bottom,
true,
+            ret = bdrv_is_allocated_above(bdrv_cow_bs(unfiltered_bs),
+                                          s->base_overlay, true,
                                              offset, n, &n);
                /* Finish early if end of backing file has been
reached */
                if (ret == 0 && n == 0) {
@@ -223,9 +227,29 @@ void stream_start(const char *job_id,
BlockDriverState *bs,
        BlockDriverState *iter;
        bool bs_read_only;
        int basic_flags = BLK_PERM_CONSISTENT_READ |
BLK_PERM_WRITE_UNCHANGED;
-    BlockDriverState *bottom = bdrv_find_overlay(bs, base);
+    BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
+    BlockDriverState *above_base;
    -    if (bdrv_freeze_backing_chain(bs, bottom, errp) < 0) {
+    if (!base_overlay) {
+        error_setg(errp, "'%s' is not in the backing chain of '%s'",
+                   base->node_name, bs->node_name);
Sorry, I am not clear with the error message.

In this case, there is no an intermediate COW node but the base, if not
NULL, is

in the backing chain of bs, isn't it?

+        return;
+    }
+
+    /*
+     * Find the node directly above @base.  @base_overlay is a COW
overlay, so
+     * it must have a bdrv_cow_child(), but it is the immediate
overlay of
+     * @base, so between the two there can only be filters.
+     */
+    above_base = base_overlay;
+    if (bdrv_cow_bs(above_base) != base) {
+        above_base = bdrv_cow_bs(above_base);
+        while (bdrv_filter_bs(above_base) != base) {
+            above_base = bdrv_filter_bs(above_base);
+        }
+    }
+
+    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {
When a concurrent stream job tries to freeze or remove the above_base
node,

we will encounter the frozen node error. The above_base node is a part
of the

concurrent job frozen chain.
Correct.

            return;
        }
    @@ -255,14 +279,19 @@ void stream_start(const char *job_id,
BlockDriverState *bs,
         * and resizes. Reassign the base node pointer because the
backing BS of the
         * bottom node might change after the call to
bdrv_reopen_set_read_only()
         * due to parallel block jobs running.
+     * above_base node might change after the call to
Yes, if not frozen.
+     * bdrv_reopen_set_read_only() due to parallel block jobs running.
         */
-    base = backing_bs(bottom);
-    for (iter = backing_bs(bs); iter && iter != base; iter =
backing_bs(iter)) {
+    base = bdrv_filter_or_cow_bs(above_base);
+    for (iter = bdrv_filter_or_cow_bs(bs); iter != base;
+         iter = bdrv_filter_or_cow_bs(iter))
+    {
            block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
                               basic_flags, &error_abort);
        }
    -    s->bottom = bottom;
+    s->base_overlay = base_overlay;
+    s->above_base = above_base;
Generally, being the filter for a concurrent job, the above_base node
may be deleted any time

and we will keep the dangling pointer. It may happen even earlier if
above_base is not frozen.

If it is, as it here, we may get the frozen link error then.
I’m not sure what you mean here.  Freezing it was absolutely
intentional.  A dangling pointer would be a problem, but that’s why it’s
frozen, so it stays around and can’t be deleted any time.

Max

The nodes we freeze should be in one context of the relevant job:

filter->top_node->intermediate_node(s)

We would not include the base or any filter above it to the frozen chain

because they are of a different job context.

They aren’t really, because we need to know the backing node of @device
after the job.

Once 'this' job is completed, we set the current backing child of the
base_overlay

and may not care of its character. If that is another job filter, it
will be replaced

with the proper node afterwards.

But what if there are filters above the base that the user wants to keep
after the job?

Max



--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]