qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting fo


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting for block devices
Date: Wed, 19 Nov 2014 15:52:51 +0000
User-agent: Mutt/1.5.23 (2014-03-12)

On Tue, Nov 18, 2014 at 03:12:12AM -0500, Francesco Romani wrote:
> > > +static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
> > > +                                            void *opaque)
> > > +{
> > > +    BdrvTrackedRequest *req = opaque;
> > > +    BlockDriverState *bs = req->bs;
> > > +    int64_t amount = 0;
> > > +
> > > +    assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> > > +    assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> > 
> > Does the code still make these assumptions or are the asserts left over
> > from previous versions of the patch?
> 
> It's a leftover.
> I understood they don't hurt and add a bit of safety, but if they are 
> confusing
> I'll remove them.

Yes, it made me wonder why.  Probably best to remove them.

> > > +void bdrv_set_usage_threshold(BlockDriverState *bs, int64_t
> > > threshold_bytes)
> > > +{
> > > +    BlockDriverState *target_bs = bs;
> > > +    if (bs->file) {
> > > +        target_bs = bs->file;
> > > +    }
> > 
> > Hmm...I think now I understand why you are trying to use bs->file.  This
> > is an attempt to make image formats work with the threshold.
> > 
> > Unfortunately the BlockDriverState topology can be more complicated than
> > just 1 level.
> 
> I thought so but I can't reproduce yet more complex topologies.
> Disclosure: I'm testing against the topology I know to be used on oVirt, 
> lacking
> immediate availability of others: suggestions welcome.
> 
> At risk of being redundant, in oVirt the usecase is QCOW2 over lvm block 
> device,
> and we'd like to be notified about the allocation of the lvm block device, 
> which (IIUC)
> is the last bs->file.
> 
> This is a simple topology (unless I'm missing something), and that's
> the reason why I go down just one level.
> 
> Of course I want a general solution for this change, so...

There is a block driver for error injection called "blkdebug" (see
docs/blkdebug.txt).  Here is an example of the following topology:

  raw_bsd (drive0) -> blkdebug -> raw-posix (test.img)

qemu-system-x86_64 -drive 
if=virtio,format=raw,file.driver=blkdebug,file.image.filename=test.img

The blkdebug driver is interposing between the raw_bsd (drive0) root and
the raw-posix leaf node.

> > If we hardcode a strategy to traverse bs->file then it will work in most
> > cases:
> > 
> >   while (bs->file) {
> >       bs = bs->file;
> >   }
> > 
> > But there are cases like VMDK extent files where a BlockDriverState
> > actually has multiple children.
> > 
> > One way to solve this is to require that the management tool tells QEMU
> > which exact BlockDriverState node the threshold applies to.  Then QEMU
> > doesn't need any hardcoded policy.  But I'm not sure how realistic that
> > it at the moment (whether management tools are uses node names for each
> > node yet), so it may be best to hardcode the bs->file traversal that
> > I've suggested.
> 
> oVirt relies on libvirt[1], and uses device node (e.g. 'vda').
> 
> BTW, I haven't found a way to inspect from the outside (e.g. monitor command) 
> the BlockDriverState
> topology, there is a way I'm missing?

You can get the BlockDriverState and ->backing_hd chain using the
query-block QMP command.  I'm not aware of a command that returns the
full graph of BlockDriverState nodes.

The management tool should not need to inspect the graph because the
graph can change at runtime (e.g. imagine I/O throttling is implemented
as a BlockDriverState node then it could appear/disapper when the
feature is activated/deactivated).  Instead the management tool should
name the nodes it knows about and then use those node names.

> Another issue I don't yet have a proper solution is related to this one.
> 
> The management app will have deal with VM with more than one block device 
> disk, so we need
> a way to map the notification with the corresponding device.
> 
> If we descend the bs->file chain, AFAIU the easiest mapping, being the device 
> name,
> is easily lost because only the outermost BlockDriverState has a device name 
> attached, so when the
> notification trigger
> bdrv_get_device_name() will return NULL

In the worst case a name string can be passed in along with the
threshold values.

> I believe we don't necessarily need a device name in the notification, for 
> example something
> like an opaque cookie set together with the threshold and returned back with 
> the notification
> will suffice. Of course the device name is nicer :)

Agreed.

Attachment: pgpLcfYFXrYeE.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]