qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting fo


From: Francesco Romani
Subject: Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting for block devices
Date: Tue, 18 Nov 2014 03:12:12 -0500 (EST)

----- Original Message -----
> From: "Stefan Hajnoczi" <address@hidden>
> To: "Francesco Romani" <address@hidden>
> Cc: address@hidden, address@hidden, address@hidden, address@hidden, 
> address@hidden
> Sent: Monday, November 17, 2014 5:49:36 PM
> Subject: Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold 
> reporting for block devices
> 
> On Fri, Nov 07, 2014 at 02:12:13PM +0100, Francesco Romani wrote:
> 
> Sorry for the long review delay.  Looks pretty good, just one real issue
> to think about at the bottom.

Hi Stefan, thanks for the review and no problem for the delay :)

> 
> > +static void usage_threshold_disable(BlockDriverState *bs)
> > +{
> 
> It would be safest to make this idempotent:
> 
> if (!usage_threshold_is_set(bs)) {
>     return;
> }
> 
> That way it can be called multiple times.

Will change.

> 
> > +    notifier_with_return_remove(&bs->wr_usage_threshold_notifier);
> > +    bs->wr_offset_threshold = 0;
> > +}
> > +
> > +static int usage_threshold_is_set(const BlockDriverState *bs)
> > +{
> > +    return !!(bs->wr_offset_threshold > 0);
> > +}
> 
> Please use the bool type instead of an int return value.

Sure, will fix.

> > +static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
> > +                                            void *opaque)
> > +{
> > +    BdrvTrackedRequest *req = opaque;
> > +    BlockDriverState *bs = req->bs;
> > +    int64_t amount = 0;
> > +
> > +    assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> > +    assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> 
> Does the code still make these assumptions or are the asserts left over
> from previous versions of the patch?

It's a leftover.
I understood they don't hurt and add a bit of safety, but if they are confusing
I'll remove them.

> > +void bdrv_set_usage_threshold(BlockDriverState *bs, int64_t
> > threshold_bytes)
> > +{
> > +    BlockDriverState *target_bs = bs;
> > +    if (bs->file) {
> > +        target_bs = bs->file;
> > +    }
> 
> Hmm...I think now I understand why you are trying to use bs->file.  This
> is an attempt to make image formats work with the threshold.
> 
> Unfortunately the BlockDriverState topology can be more complicated than
> just 1 level.

I thought so but I can't reproduce yet more complex topologies.
Disclosure: I'm testing against the topology I know to be used on oVirt, lacking
immediate availability of others: suggestions welcome.

At risk of being redundant, in oVirt the usecase is QCOW2 over lvm block device,
and we'd like to be notified about the allocation of the lvm block device, 
which (IIUC)
is the last bs->file.

This is a simple topology (unless I'm missing something), and that's
the reason why I go down just one level.

Of course I want a general solution for this change, so...

> If we hardcode a strategy to traverse bs->file then it will work in most
> cases:
> 
>   while (bs->file) {
>       bs = bs->file;
>   }
> 
> But there are cases like VMDK extent files where a BlockDriverState
> actually has multiple children.
> 
> One way to solve this is to require that the management tool tells QEMU
> which exact BlockDriverState node the threshold applies to.  Then QEMU
> doesn't need any hardcoded policy.  But I'm not sure how realistic that
> it at the moment (whether management tools are uses node names for each
> node yet), so it may be best to hardcode the bs->file traversal that
> I've suggested.

oVirt relies on libvirt[1], and uses device node (e.g. 'vda').

BTW, I haven't found a way to inspect from the outside (e.g. monitor command) 
the BlockDriverState
topology, there is a way I'm missing?

Another issue I don't yet have a proper solution is related to this one.

The management app will have deal with VM with more than one block device disk, 
so we need
a way to map the notification with the corresponding device.

If we descend the bs->file chain, AFAIU the easiest mapping, being the device 
name,
is easily lost because only the outermost BlockDriverState has a device name 
attached, so when the
notification trigger
bdrv_get_device_name() will return NULL

I believe we don't necessarily need a device name in the notification, for 
example something
like an opaque cookie set together with the threshold and returned back with 
the notification
will suffice. Of course the device name is nicer :)

+++

[1] if that can help further to understand the usecase, these are the libvirt 
APIs being used
by oVirt:
http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockStatsFlags
http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetBlockInfo
both relies on the output[2] of 'query-blockstats' monitor command.

[2] AFAIU -but this is just my guesswork!- it also assumes a quite simple 
topology like I did

Thanks and bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani



reply via email to

[Prev in Thread] Current Thread [Next in Thread]