[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] n ways block filters
From: |
Benoît Canet |
Subject: |
Re: [Qemu-devel] n ways block filters |
Date: |
Thu, 20 Mar 2014 17:06:26 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
The Thursday 20 Mar 2014 à 16:12:34 (+0100), Kevin Wolf wrote :
> Am 20.03.2014 um 15:05 hat Benoît Canet geschrieben:
> > The Tuesday 18 Mar 2014 à 14:27:47 (+0100), Kevin Wolf wrote :
> > > Am 17.03.2014 um 17:02 hat Stefan Hajnoczi geschrieben:
> > > > On Mon, Mar 17, 2014 at 4:12 AM, Fam Zheng <address@hidden> wrote:
> > > > > On Fri, 03/14 16:57, Benoît Canet wrote:
> > > > >> I discussed a bit with Stefan on the list and we came to the
> > > > >> conclusion that the
> > > > >> block filter API need group support.
> > > > >>
> > > > >> filter group:
> > > > >> -------------
> > > > >>
> > > > >> My current plan to implement this is to add the following fields to
> > > > >> the BlockDriver
> > > > >> structure.
> > > > >>
> > > > >> int bdrv_add_filter_group(const char *name, QDict options);
> > > > >> int bdrv_reconfigure_filter_group(const char *name, QDict options);
> > > > >> int bdrv_destroy_filter_group(const char *name);
> > >
> > > Benoît, your mail left me puzzled. You didn't really describe the
> > > problem that you're solving, nor what the QDict options actually
> > > contains or what a filter group even is.
> > >
> > > > >> These three extra method would allow to create, reconfigure or
> > > > >> destroy a block
> > > > >> filter group. A block filter group contain the shared or non shared
> > > > >> state of the
> > > > >> blockfilter. For throttling it would contains the ThrottleState
> > > > >> structure.
> > > > >>
> > > > >> Each block filter driver would contains a linked list of linked list
> > > > >> where the
> > > > >> BDS are registered grouped by filter groups state.
> > > > >
> > > > > Sorry I don't fully understand this. Does a filter group contain
> > > > > multiple block
> > > > > filters, and every block filter has effect on multiple BDSes? Could
> > > > > you give an
> > > > > example?
> > > >
> > > > Just to why a "group" mechanism is useful:
> > > >
> > > > You want to impose a 2000 IOPS limit for the entire VM. Currently
> > > > this is not possible because each drive has its own throttling state.
> > > >
> > > > We need a way to say certain drives are part of a group. All drives
> > > > in a group share the same throttling state and therefore a 2000 IOPS
> > > > limit is shared amongst them.
> > >
> > > Now at least I have an idea what you're all talking about, but it's
> > > still not obvious to me how the three functions from above solve your
> > > problem or how they work in detail.
> > >
> > > The obvious solution, using often discussed blockdev-add concepts, is:
> > > ______________
> > > virtio-blk_A --> | | --> qcow2_A --> raw-posix_A
> > > | throttling |
> > > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B
> >
> > My proposal would be:
> > ______________
> > virtio-blk_A --> | BDS 1 | --> qcow2_A --> raw-posix_A
> > |____________|
> > |
> > _____|________
> > | | The shared state is the state of a BDS
> > group
> > | Shared | It's stored in a static linked list of the
> > | State | block/throttle.c module. It has a name and
> > contains a
> > |____________| throttle state structure.
> > |
> > _____|________
> > | BDS 2 |
> > virtio_blk_B --> |____________| --> qcow2_B --> nbd_B
>
> Okay. I think your proposal might be easier to implement in the short
> run, but it introduces an additional type of nodes to the graph (so far
> we have only one type, BlockDriverStates) with their own set of
> functions, and I assume monitor commands, for management.
>
> This makes the whole graph less uniform and consistent. There may be
> cases where this is necessary or at least tolerable because the fully
> generic alternativ isn't doable. I'm not convinced yet that this is the
> case here.
>
> In contrast, my approach would require considerable infrastructure work
> (you somehow seem to attract that kind of things ;-)), but it's merely a
> generalisation of what we already have and as such fits nicely in the
> graph.
>
> We already have multiple children of BDS nodes. And we take it for
> granted that they don't refer to the same data, but that bs->file and
> bs->backing_hd have actually different semantics.
>
> We have recently introduced refcounts for BDSes so that one BDS can now
> have multiple parents, too, as a first step towards symmetry. The
> logical extension is that these parent get different semantics, just
> like the children have different semantics.
>
> Doing the abstraction in one model right instead of adding hacks that
> don't really fit in but are easy to implement has paid off in the past.
> I'm pretty sure that extending the infrastructure this way will find
> more users than just I/O throttling, and that having different parents
> in different roles is universally useful. With qcow2 exposing the
> snapshots, too, I already named a second potential user of the
> infrastructure.
>
> > The name of the shared state is the throttle group name.
> > The three added methods are used to add, configure and destroy such shared
> > states.
> >
> > The benefit of this aproach is that we don't need to add a special slot
> > mechanism
> > and that removing BDS 2 would be easy.
> > Your approach don't deal with the fact that the throttling group membership
> > can
> > be changed dynamically while the vm is running: for example adding qcow2_C
> > and
> > removing qcow2_B should be made easy.
>
> Yes, this is right. But then, the nice thing about it is that I stayed
> fully within the one uniform graph. We just need a way to modify the
> edges in this graph (and we already need that to insert/delete filters)
> and you get this special case and many others for free.
>
> So, I vote for investing into a uniform infrastructure here instead of
> adding new one-off node types.
Maybe parents BDS could use a generic block function to get a cookie when they
start to use a children BDS.
The parent would to
bs->file_cookie = bdrv_get_cookie(file);
bs->file = file;
when choosing to use file as bs file.
The get cookie method would be
uint64_t bdrv_get_cookie(bs) {
bs->cookie = gen_uuid(bs);
return bs->cookie;
}
gen_uuid would combine a random 64 bit number with a registry to prevent
identical cookie generation.
After this step every BlockDriver method would receive the cookie as second
parameter.
For example bdrv_read(bs, cookie, ...)
So it's easy for a block driver to discriminate based on the cookie and even to
look up which of his own child is associated to this cookie.
Best regards
Benoît
>
> Kevin
>
> > > That is, the I/O throttling BDS is referenced by two devices instead of
> > > just one and it associates one 'input' with one 'output'. Once we have
> > > BlockBackend, we would have two BBs, but still only one throttling
> > > BDS.
> > >
> > > The new thing that you get there is that the throttling driver has
> > > not only multiple parents (that part exists today), but it behaves
> > > differently depending on who called it. So we need to provide some way
> > > for one BDS to expose multiple slots or whatever you want to call them
> > > that users can attach to.
> > >
> > > This is, by the way, the very same thing as would be required for
> > > exposing qcow2 internal snapshots (read-only) while the VM is running.
> > >
> > > Kevin
> > >
- [Qemu-devel] n ways block filters, Benoît Canet, 2014/03/14
- Re: [Qemu-devel] n ways block filters, Fam Zheng, 2014/03/16
- Re: [Qemu-devel] n ways block filters, Benoît Canet, 2014/03/17
- Re: [Qemu-devel] n ways block filters, Stefan Hajnoczi, 2014/03/17
- Re: [Qemu-devel] n ways block filters, Kevin Wolf, 2014/03/18
- Re: [Qemu-devel] n ways block filters, Benoît Canet, 2014/03/20
- Re: [Qemu-devel] n ways block filters, Kevin Wolf, 2014/03/20
- Re: [Qemu-devel] n ways block filters, Benoît Canet, 2014/03/20
- Re: [Qemu-devel] n ways block filters,
Benoît Canet <=
- Re: [Qemu-devel] n ways block filters, Benoît Canet, 2014/03/24
- Re: [Qemu-devel] n ways block filters, Kevin Wolf, 2014/03/24
- Re: [Qemu-devel] n ways block filters, Benoît Canet, 2014/03/27