qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v8 4/7] block: introduce backup-top filter drive


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v8 4/7] block: introduce backup-top filter driver
Date: Mon, 17 Jun 2019 16:56:15 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0

On 17.06.19 12:36, Vladimir Sementsov-Ogievskiy wrote:
> 14.06.2019 23:03, Max Reitz wrote:
>> On 14.06.19 18:22, Vladimir Sementsov-Ogievskiy wrote:
>>> 14.06.2019 15:57, Max Reitz wrote:
>>>> On 14.06.19 11:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 13.06.2019 18:57, Max Reitz wrote:
>>>>>> On 29.05.19 17:46, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> Backup-top filter does copy-before-write operation. It should be
>>>>>>> inserted above active disk and has a target node for CBW, like the
>>>>>>> following:
>>>>>>>
>>>>>>>        +-------+
>>>>>>>        | Guest |
>>>>>>>        +-------+
>>>>>>>            |r,w
>>>>>>>            v
>>>>>>>        +------------+  target   +---------------+
>>>>>>>        | backup_top |---------->| target(qcow2) |
>>>>>>>        +------------+   CBW     +---------------+
>>>>>>>            |
>>>>>>> backing |r,w
>>>>>>>            v
>>>>>>>        +-------------+
>>>>>>>        | Active disk |
>>>>>>>        +-------------+
>>>>>>>
>>>>>>> The driver will be used in backup instead of write-notifiers.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>>>>>> ---
>>>>>>>     block/backup-top.h  |  64 +++++++++
>>>>>>>     block/backup-top.c  | 322 
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>     block/Makefile.objs |   2 +
>>>>>>>     3 files changed, 388 insertions(+)
>>>>>>>     create mode 100644 block/backup-top.h
>>>>>>>     create mode 100644 block/backup-top.c
>>>>>>>
>>>>>>> diff --git a/block/backup-top.h b/block/backup-top.h
>>>>>>> new file mode 100644
>>>>>>> index 0000000000..788e18c358
>>>>>>> --- /dev/null
>>>>>>> +++ b/block/backup-top.h
>>>>
>>>> [...]
>>>>
>>>>>>> +/*
>>>>>>> + * bdrv_backup_top_append
>>>>>>> + *
>>>>>>> + * Append backup_top filter node above @source node. @target node will 
>>>>>>> receive
>>>>>>> + * the data backed up during CBE operations. New filter together with 
>>>>>>> @target
>>>>>>> + * node are attached to @source aio context.
>>>>>>> + *
>>>>>>> + * The resulting filter node is implicit.
>>>>>>
>>>>>> Why?  It’s just as easy for the caller to just make it implicit if it
>>>>>> should be.  (And usually the caller should decide that.)
>>>>>
>>>>> Personally, I don't know what are the reasons for filters to bi implicit 
>>>>> or not,
>>>>> I just made it like other job-filters.. I can move making-implicit to the 
>>>>> caller
>>>>> or drop it at all (if it will work).
>>>>
>>>> Nodes are implicit if they haven’t been added consciously by the user.
>>>> A node added by a block job can be non-implicit, too, as mirror shows;
>>>> If the user specifies the filter-node-name option, they will know about
>>>> the node, thus it is no longer implicit.
>>>>
>>>> If the user doesn’t know about the node (they didn’t give the
>>>> filter-node-name option), the node is implicit.
>>>>
>>>
>>> Ok, I understand it. But it doesn't show, why it should be implicit?
>>> Isn't it less buggy to make all filters explicit? We don't hide implicit 
>>> nodes
>>> from query-named-block-nodes (the only interface to explore the whole graph 
>>> for now)
>>> anyway. And we can't absolutely hide side effects of additional node in the 
>>> graph.
>>
>> Well, we try, at least.  At least we hide them from query-block.
>>
>>> So, is there any real benefit of supporting separation into implicit and 
>>> explicit filters?
>>> It seems for me that it only complicates things...
>>> In other words, what will break if we make all filters explicit?
>>
>> The theory is that qemu may decide to add nodes at any point, but at
>> least when managing chains etc., they may not be visible to the user.  I
>> don’t think we can get rid of them so easily.
>>
>> One example that isn’t implemented yet is copy-on-read.  In theory,
>> specifying copy-on-read=on for -drive should create an implicit COR node
>> on top.  The user shouldn’t see that node when inspecting the drive or
>> when issuing block jobs on it, etc.  And this node has to stay there
>> when the user does e.g. an active commit somewhere down the chain.
> 
> Why instead not to just write in doc that yes, filter is created when you
> enable copy-on-read?

Because the problem is that existing users are not aware of that.

If they were, they could simply create a COR filter already.

I suppose we could interpret the deprecation policy in a way that we
could change the behavior of -drive copy-on-read=on, but as I already
said, what’s the point of supporting -drive copy-on-read=on, when you
can simply explicitly create a COR filter on top?

> And do same for all operations which may create filter,
> we don't have a lot of them? And add optional parameter to set node-name for
> created filter.

That optional parameter exists, and if it is given, the user shows that
they are aware of the node.  Hence the node becomes explicit then.

The problem is with legacy users that use an existing interface and
suddenly the externally visible interface of qemu changes in a way that
could break them.  I suppose additional entries in
query-named-block-nodes is not breakage.  Maybe it is, that would be a
bug then.

(If nobody has noticed so far, that may be because these legacy
applications didn’t use query-named-block-nodes.  But now maybe libvirt
does, and so if we converted copy-on-read=on to a COR node, it would
notice now and thus break.  So just because our handling of implicit
nodes is broken right now does not mean we can abandon the concept
altogether.)

>> That sounds like a horrible ordeal to implement, so it hasn’t been done
>> yet.  Maybe it never will.  It isn’t that bad for the job filters,
>> because they generally freeze the block graph, so there is no problem
>> with potential modifications.
>>
>> All in all I do think having implicit nodes makes sense.  Maybe not so
>> much now, but in the future (if someone implements converting -drive COR
>> and throttle options to implicit nodes...).
> 
> But do we have at least strong definition of how implicit nodes should behave
> on any graph-modifying operations around them?

No.  Depends on the node.

> Should new implicit/explicit
> filters be created above or under them?

That was always the most difficult question we had when we introduced
filters.

The problem is that we never answered it in our code base.

One day, we just added filters (“let’s see what happens”), and in the
beginning, they were all explicit, so we still didn’t have to answer
this question (in code).  Then job filters were added, and we still
didn’t have to, because they set blockers so the graph couldn’t be
reorganized with them in place anyway.

If we converted copy-on-read=on to a COR node, we would have to answer
that question.

> We should specify it in doc about all
> such operations, otherwise effect of implicit nodes may change unpredictably 
> for
> the user. I'm afraid, that implicit filters will just continue my list of
> features which should not be used if we want to be able to maintain all this 
> mess..

Yeah, well, that is nothing new at all.  For example, note the “Split
I/O throttling off into own BDS – Requires some care with snapshots
etc.” here:
https://wiki.qemu.org/ToDo/Block#Dynamic_graph_reconfiguration_.28e.g._adding_block_filters.2C_taking_snapshots.2C_etc..29

This was always a problem.  It was always the biggest problem with
filters.  We never answered it because it never become an acute problem,
as I described above.

We just said “For explicit filters, they just stay where they are, and
that’s OK because the user can take care of them.”  We never did add the
implicit filters that were actually going to become a problem (COR and
throttle).

So as I said above, the behavior of implicit nodes needs to be examined
on a case-by-case basis.  Well, actually, this is only about COR and
throttle, really, and both would behave the same: Always stay at the
BlockBackend, because that is the behavior when you currently use
BB-level throttling or COR.

The jobs that now use implicit nodes freeze the graph, so there is no
problem there.

(Also, by the way, we do not need the option to give COR/throttle nodes
created through BB options a node name.  As I said above, at that point
you can just create the node yourself.)

> 1. If you something around filenames don't work, use node-names, and better
> don't use filename-based APIs

I think we agree that filename-based APIs were a mistake.  (Though only
in hindsight.  They were added when there where no node names, so it did
make sense then.)

So now you should not use them, correct.

> 2. If you qemu-img can't do what you need, use paused qemu process, and 
> better,
> don't use qemu-img

There was always a plan to give qemu-img feature parity to qemu proper.
 But often the interface is just a problem.  Node names really don’t
make much sense with qemu-img, I think.

(I mean, feel free to add something that automatically names the backing
nodes e.g. backing.0, backing.1, etc., and then add an interface to
qemu-img commit to use node names, but I find that weird.)

So I don’t quite see the problem.  Of course qemu has more functionality
than qemu-img.  qemu-img is fine as long as you have a simple set-up,
but once it gets more complicated, you need management software; and
that software shouldn’t care much whether it uses qemu-img or qemu itself.

> and a new one:
> 
> 3. If something is broken around jobs and filters and any block operations,
> set filter-node-name in parameters to make filter node explicit. And better,
> always set it..

Uh, yeah?  I don’t see the problem here.

This is like saying “If I want to use new features, I should use
-blockdev and not -hda”.  Yes, that is correct.  You should set
filter-node-name if you care about the exact state of the graph.

> So at least, I think we should make it possible for all filters to be explicit
> if user wants, by setting its name like in mirror.

For all filters that the user has no other way of creating them, yes.
(That is, the filters created by the block jobs.)

> (and then, I will not really care of implicit-filters related logic, like I 
> don't
> really care about filename-api related logic)
> 
> Also, now, implict filters are actually available for direct manipulation by 
> user,
> as their node-names are exported in query-named-block-nodes and nothing 
> prevents
> using these names in differet APIs.

Hm, yes.  Is that a problem?

>>>> [...]
>>>>
>>>>>>> +static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
>>>>>>> +{
>>>>>>> +    if (!bs->backing) {
>>>>>>> +        return 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return bdrv_co_flush(bs->backing->bs);
>>>>>>
>>>>>> Should we flush the target, too?
>>>>>
>>>>> Hm, you've asked it already, on previous version :)
>>>>
>>>> I wasn’t sure...
>>>>
>>>>> Backup don't do it,
>>>>> so I just keep old behavior. And what is the reason to flush backup target
>>>>> on any guest flush?
>>>>
>>>> Hm, well, what’s the reason not to do it?
>>>
>>> guest flushes will be slowed down?
>>
>> Hm, the user could specify cache=unsafe if they don’t care.  Which gives
>> me second thoughs... [1]
>>
>>>> Also, there are not only guest flushes.  bdrv_flush_all() exists, which
>>>> is called when the guest is stopped.  So who is going to flush the
>>>> target if not its parent?
>>>>
>>>> [...]
>>>
>>> Backup job? But filter should not relay on it..
>>
>> Hm.  Backup job actually doesn’t sound that wrong.
>>
>>> But should really filter do that job, or it is here only to do CBW? Maybe 
>>> target
>>> must have another parent which will care about flushing.
>>>
>>> Ok, I think I'll flush target here too for safety, and leave a comment, 
>>> that something
>>> smarter would be better.
>>> (or, if we decide to flush all children by default in generic code, I'll 
>>> drop this handler)
>>
>> [1] Thinking more about this, for normal backups the target file does
>> not reflect a valid disk state until the backup is done; just like for
>> qemu-img convert.  Just like qemu-img convert, there is therefore no
>> reason to flush the target until the job is done.
>>
>> But there is also the other case of image fleecing.  In this case, the
>> target will have another parent, so bdrv_flush_all() will be done by
>> someone.  Still, it intuitively makes sense to me that in this case, the
>> backup-top filter should flush the target if the guest decides to flush
>> the device (so that the target stays consistent on disk).
>>
>> backup-top currently cannot differentiate between the cases, but maybe
>> that is generally a useful hint to give to it?  (That is, whether the
>> target has a consistent state or not.)
> 
> Hmm, for fleecing we don't need consistent state of temporary image. If 
> something fails,
> the whole operation is considered to be failed. And anyway, I don't see 
> relations
> between consistency of temporary fleecing image and guest flushes, why should 
> we
> bind them?

Guest flushes are about clearing the cache and keeping the disk state
consistent in case of e.g. power failure.  So if the target should be
consistent, it too should then survive a power failure.

Hm, but then again, if we don’t enforce on the host that the target is
always flushed before the source, that consistency is quickly lost when
the host decides to flush the cache just because it wants to.

Hmmm.  I guess we could say that the target is always inconsistent on
e.g. power failure.

My problem is that the target is also, well, a backup, you know.  So
it’s kind of a pain if we don’t care about it.  The user probably wanted
to save that old data, and now it may be lost just because we didn’t
flush it when the user entered “sync” in the VM.  I don’t know.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]