qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds thre


From: Francesco Romani
Subject: Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds threshold
Date: Tue, 2 Dec 2014 02:47:45 -0500 (EST)

Thanks for the quick review!

----- Original Message -----
> From: "Eric Blake" <address@hidden>
> To: "Francesco Romani" <address@hidden>, address@hidden
> Cc: address@hidden, address@hidden, address@hidden, address@hidden
> Sent: Monday, December 1, 2014 10:07:38 PM
> Subject: Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds  
> threshold
> 
> On 11/28/2014 05:31 AM, Francesco Romani wrote:
> > Managing applications, like oVirt (http://www.ovirt.org), make extensive
> > use of thin-provisioned disk images.
> > To let the guest run smoothly and be not unnecessarily paused, oVirt sets
> > a disk usage threshold (so called 'high water mark') based on the
> > occupation
> > of the device,  and automatically extends the image once the threshold
> > is reached or exceeded.
> > 
> > In order to detect the crossing of the threshold, oVirt has no choice but
> > aggressively polling the QEMU monitor using the query-blockstats command.
> > This lead to unnecessary system load, and is made even worse under scale:
> > deployments with hundreds of VMs are no longer rare.
> > 
> > To fix this, this patch adds:
> > * A new monitor command to set a mark for a given block device.
> > * A new event to report if a block device usage exceeds the threshold.
> > 
> > This will allow the managing application to drop the polling
> > altogether and just wait for a watermark crossing event.
> 
> I like the idea!
> 
> Question - what happens if management misses the event (for example, if
> libvirtd is restarted)?  Does the existing 'query-blockstats' and/or
> 'query-named-block-nodes' still work to query the current threshold and
> whether it has been exceeded, as a poll-once command executed when
> reconnecting to the monitor?

Indeed oVirt wants to keep the existing polling and to use it as fallback,
to make sure no events are missed. oVirt will not rely on the new notification
*alone*.
The plan is to "just" poll *much less* frequently. Today's default poll
rate is every 2 (two) seconds, so there is a lot of room for improvement.

> 
> > 
> > Signed-off-by: Francesco Romani <address@hidden>
> > ---
> 
> No need for a 0/1 cover letter on a 1-patch series; you have the option
> of just putting the side-band information here and sending it as a
> single mail.  But the cover letter approach doesn't hurt either, and I
> can see how it can be easier for some workflows to always send a cover
> letter than to special-case a 1-patch series.

Also, I found that a separate context/introduction/room for comments
would helped, especially on first two revisions of the patch which were
tagged as RFC. I have zero problems in dropping the cover letter now that
consensus is forming and patch is taking shape, just let me know.

[... snip spelling: thanks! will fix]
> > +++ b/qapi/block-core.json
> > @@ -239,6 +239,9 @@
> >  #
> >  # @iops_size: #optional an I/O size in bytes (Since 1.7)
> >  #
> > +# @write-threshold: configured write threshold for the device.
> > +#                   0 if disabled. (Since 2.3)
> > +#
> >  # Since: 0.14.0
> >  #
> >  ##
> > @@ -253,7 +256,7 @@
> >              '*bps_max': 'int', '*bps_rd_max': 'int',
> >              '*bps_wr_max': 'int', '*iops_max': 'int',
> >              '*iops_rd_max': 'int', '*iops_wr_max': 'int',
> > -            '*iops_size': 'int' } }
> > +            '*iops_size': 'int', 'write-threshold': 'uint64' } }
> 
> In QMP specs, 'uint64' and 'int' are practically synonyms.  I can live
> with either spelling, although 'int' is more common.
> 
> Bikeshed on naming: Although we prefer '-' over '_' in new interfaces,
> we also favor consistency, and BlockDeviceInfo is one of those dinosaur
> commands that uses _ everywhere until your addition.  So naming this
> field 'write_threshold' would be more consistent.

Agreed. Will fix.

> > +#
> > +# @node-name: graph node name on which the threshold was exceeded.
> > +#
> > +# @amount-exceeded: amount of data which exceeded the threshold, in bytes.
> > +#
> > +# @offset-threshold: last configured threshold, in bytes.
> > +#
> 
> Might want to mention that this event is one-shot; after it triggers, a
> user must re-register a threshold to get the event again.

Good point. Will fix.

> 
> > +# Since: 2.3
> > +##
> > +{ 'event': 'BLOCK_USAGE_THRESHOLD',
> > +  'data': { 'node-name': 'str',
> > +       'amount-exceeded': 'uint64',
> 
> TAB damage.  Please use spaces.  ./scripts/checkpatch.pl will catch some
> offenders (although I didn't test if it will catch this one).
> 
> However, here you are correct in using '-' for naming :)

Oops. Rebase error (was clean before!) will fix.

> 
> > +       'threshold': 'uint64' } }
> > +
> > +##
> > +# @block-set-threshold
> > +#
> > +# Change usage threshold for a block drive. An event will be delivered
> > +# if a write to this block drive crosses the configured threshold.
> > +# This is useful to transparently resize thin-provisioned drives without
> > +# the guest OS noticing.
> > +#
> > +# @node-name: graph node name on which the threshold must be set.
> > +#
> > +# @write-threshold: configured threshold for the block device, bytes.
> > +#                   Use 0 to disable the threshold.
> > +#
> > +# Returns: Nothing on success
> > +#          If @node name is not found on the block device graph,
> > +#          DeviceNotFound
> > +#
> > +# Since: 2.3
> > +##
> > +{ 'command': 'block-set-threshold',
> > +  'data': { 'node-name': 'str', 'threshold': 'uint64' } }
> 
> Again, 'int' instead of 'uint64' is more typical, but shouldn't hurt
> with either spelling.
> 
> > +SQMP
> > +block-set-threshold
> > +------------
> > +
> > +Change the write threshold for a block drive. The threshold is an offset,
> > +thus must be non-negative. Default is not write threshold.
> 
> s/not/no/
> 
> > +To set the threshold to zero disables it.
> 
> s/To set/Setting/
> 
> Missing the change to the 'query-block' and 'query-named-block-nodes'
> examples to show the new always-present output field.


Sorry, not sure I'm following. Isn't that hunk enough?

diff --git a/block/qapi.c b/block/qapi.c
index a87a34a..c85abb0 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -24,6 +24,7 @@
 
 #include "block/qapi.h"
 #include "block/block_int.h"
+#include "block/usage-threshold.h"
 #include "qmp-commands.h"
 #include "qapi-visit.h"
 #include "qapi/qmp-output-visitor.h"
@@ -82,6 +83,8 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs)
         info->iops_size = cfg.op_size;
     }
 
+    info->write_threshold = bdrv_usage_threshold_get(bs);
+
     return info;
 }
 

It produces the output I'd expect on my tests:

Welcome to the QMP low-level shell!
Connected to QEMU 2.1.91

(QEMU) query-block
{   u'return': [   {   u'device': u'drive-virtio-disk0',
                       u'inserted': {   u'backing_file_depth': 0,
                                        u'bps': 0,
                                        u'bps_rd': 0,
                                        u'bps_wr': 0,
                                        u'detect_zeroes': u'off',
                                        u'drv': u'qcow2',
                                        u'encrypted': False,
                                        u'encryption_key_missing': False,
                                        u'file': 
u'/var/lib/libvirt/images/fedora20.qcow2',
                                        u'image': {   u'actual-size': 11804672,
                                                      u'cluster-size': 65536,
                                                      u'dirty-flag': False,
                                                      u'filename': 
u'/var/lib/libvirt/images/fedora20.qcow2',
                                                      u'format': u'qcow2',
                                                      u'format-specific': {   
u'data': {   u'compat': u'1.1',
                                                                                
           u'corrupt': False,
                                                                                
           u'lazy-refcounts': False},
                                                                              
u'type': u'qcow2'},
                                                      u'virtual-size': 
17179869184},
                                        u'iops': 0,
                                        u'iops_rd': 0,
                                        u'iops_wr': 0,
                                        u'node-name': u'foo',
                                        u'ro': False,
                                        u'write-threshold': 0},
                       u'io-status': u'ok',
                       u'locked': False,
                       u'removable': False,
                       u'type': u'unknown'},
                   {   u'device': u'drive-ide0-0-0',
                       u'inserted': {   u'backing_file_depth': 0,
                                        u'bps': 0,
                                        u'bps_rd': 0,
                                        u'bps_wr': 0,
                                        u'detect_zeroes': u'off',
                                        u'drv': u'raw',
                                        u'encrypted': False,
                                        u'encryption_key_missing': False,
                                        u'file': 
u'/var/lib/libvirt/images/Fedora-Live-Desktop-x86_64-20-1.iso',
                                        u'image': {   u'actual-size': 999297024,
                                                      u'dirty-flag': False,
                                                      u'filename': 
u'/var/lib/libvirt/images/Fedora-Live-Desktop-x86_64-20-1.iso',
                                                      u'format': u'raw',
                                                      u'virtual-size': 
999292928},
                                        u'iops': 0,
                                        u'iops_rd': 0,
                                        u'iops_wr': 0,
                                        u'ro': True,
                                        u'write-threshold': 0},
                       u'io-status': u'ok',
                       u'locked': False,
                       u'removable': True,
                       u'tray_open': False,
                       u'type': u'unknown'}]}

(QEMU) query-named-block-nodes
{   u'return': [   {   u'backing_file_depth': 0,
                       u'bps': 0,
                       u'bps_rd': 0,
                       u'bps_wr': 0,
                       u'detect_zeroes': u'off',
                       u'drv': u'qcow2',
                       u'encrypted': False,
                       u'encryption_key_missing': False,
                       u'file': u'/var/lib/libvirt/images/fedora20.qcow2',
                       u'image': {   },
                       u'iops': 0,
                       u'iops_rd': 0,
                       u'iops_wr': 0,
                       u'node-name': u'foo',
                       u'ro': False,
                       u'write-threshold': 0},
                   {   u'backing_file_depth': 0,
                       u'bps': 0,
                       u'bps_rd': 0,
                       u'bps_wr': 0,
                       u'detect_zeroes': u'off',
                       u'drv': u'file',
                       u'encrypted': False,
                       u'encryption_key_missing': False,
                       u'file': u'/var/lib/libvirt/images/fedora20.qcow2',
                       u'image': {   },
                       u'iops': 0,
                       u'iops_rd': 0,
                       u'iops_wr': 0,
                       u'node-name': u'bar',
                       u'ro': False,
                       u'write-threshold': 0}]}


(relevant cmd-line snippet:
-drive 
file=/var/lib/libvirt/images/fedora20.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,node-name=foo,file.node-name=bar
 \
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
 \
)

Bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani



reply via email to

[Prev in Thread] Current Thread [Next in Thread]