qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds thre


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds threshold
Date: Mon, 01 Dec 2014 14:07:38 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

On 11/28/2014 05:31 AM, Francesco Romani wrote:
> Managing applications, like oVirt (http://www.ovirt.org), make extensive
> use of thin-provisioned disk images.
> To let the guest run smoothly and be not unnecessarily paused, oVirt sets
> a disk usage threshold (so called 'high water mark') based on the occupation
> of the device,  and automatically extends the image once the threshold
> is reached or exceeded.
> 
> In order to detect the crossing of the threshold, oVirt has no choice but
> aggressively polling the QEMU monitor using the query-blockstats command.
> This lead to unnecessary system load, and is made even worse under scale:
> deployments with hundreds of VMs are no longer rare.
> 
> To fix this, this patch adds:
> * A new monitor command to set a mark for a given block device.
> * A new event to report if a block device usage exceeds the threshold.
> 
> This will allow the managing application to drop the polling
> altogether and just wait for a watermark crossing event.

I like the idea!

Question - what happens if management misses the event (for example, if
libvirtd is restarted)?  Does the existing 'query-blockstats' and/or
'query-named-block-nodes' still work to query the current threshold and
whether it has been exceeded, as a poll-once command executed when
reconnecting to the monitor?

> 
> Signed-off-by: Francesco Romani <address@hidden>
> ---

No need for a 0/1 cover letter on a 1-patch series; you have the option
of just putting the side-band information here and sending it as a
single mail.  But the cover letter approach doesn't hurt either, and I
can see how it can be easier for some workflows to always send a cover
letter than to special-case a 1-patch series.

> +static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
> +                                            void *opaque)
> +{
> +    BdrvTrackedRequest *req = opaque;
> +    BlockDriverState *bs = req->bs;
> +    uint64_t amount = 0;
> +
> +    amount = bdrv_usage_threshold_exceeded(bs, req);
> +    if (amount > 0) {
> +        qapi_event_send_block_usage_threshold(
> +            bs->node_name,
> +            amount,
> +            bs->write_threshold_offset,
> +            &error_abort);
> +
> +        /* autodisable to avoid to flood the monitor */

s/to flood/flooding/

> +/*
> + * bdrv_usage_threshold_is_set
> + *
> + * Tell if an usage threshold is set for a given BDS.

s/an usage/a usage/

(in English, the difference between "a" and "an" is whether the leading
sound of the next word is pronounced or not; in this case, "usage" is
pronounced with a hard "yoo-sage".  It may help to remember "an umbrella
for a unicorn")

> +++ b/qapi/block-core.json
> @@ -239,6 +239,9 @@
>  #
>  # @iops_size: #optional an I/O size in bytes (Since 1.7)
>  #
> +# @write-threshold: configured write threshold for the device.
> +#                   0 if disabled. (Since 2.3)
> +#
>  # Since: 0.14.0
>  #
>  ##
> @@ -253,7 +256,7 @@
>              '*bps_max': 'int', '*bps_rd_max': 'int',
>              '*bps_wr_max': 'int', '*iops_max': 'int',
>              '*iops_rd_max': 'int', '*iops_wr_max': 'int',
> -            '*iops_size': 'int' } }
> +            '*iops_size': 'int', 'write-threshold': 'uint64' } }

In QMP specs, 'uint64' and 'int' are practically synonyms.  I can live
with either spelling, although 'int' is more common.

Bikeshed on naming: Although we prefer '-' over '_' in new interfaces,
we also favor consistency, and BlockDeviceInfo is one of those dinosaur
commands that uses _ everywhere until your addition.  So naming this
field 'write_threshold' would be more consistent.

> +##
> +# @BLOCK_USAGE_THRESHOLD
> +#
> +# Emitted when writes on block device reaches or exceeds the
> +# configured threshold. For thin-provisioned devices, this
> +# means the device should be extended to avoid pausing for
> +# disk exaustion.

s/exaustion/exhaustion/

> +#
> +# @node-name: graph node name on which the threshold was exceeded.
> +#
> +# @amount-exceeded: amount of data which exceeded the threshold, in bytes.
> +#
> +# @offset-threshold: last configured threshold, in bytes.
> +#

Might want to mention that this event is one-shot; after it triggers, a
user must re-register a threshold to get the event again.

> +# Since: 2.3
> +##
> +{ 'event': 'BLOCK_USAGE_THRESHOLD',
> +  'data': { 'node-name': 'str',
> +         'amount-exceeded': 'uint64',

TAB damage.  Please use spaces.  ./scripts/checkpatch.pl will catch some
offenders (although I didn't test if it will catch this one).

However, here you are correct in using '-' for naming :)

> +         'threshold': 'uint64' } }
> +
> +##
> +# @block-set-threshold
> +#
> +# Change usage threshold for a block drive. An event will be delivered
> +# if a write to this block drive crosses the configured threshold.
> +# This is useful to transparently resize thin-provisioned drives without
> +# the guest OS noticing.
> +#
> +# @node-name: graph node name on which the threshold must be set.
> +#
> +# @write-threshold: configured threshold for the block device, bytes.
> +#                   Use 0 to disable the threshold.
> +#
> +# Returns: Nothing on success
> +#          If @node name is not found on the block device graph,
> +#          DeviceNotFound
> +#
> +# Since: 2.3
> +##
> +{ 'command': 'block-set-threshold',
> +  'data': { 'node-name': 'str', 'threshold': 'uint64' } }

Again, 'int' instead of 'uint64' is more typical, but shouldn't hurt
with either spelling.

> +SQMP
> +block-set-threshold
> +------------
> +
> +Change the write threshold for a block drive. The threshold is an offset,
> +thus must be non-negative. Default is not write threshold.

s/not/no/

> +To set the threshold to zero disables it.

s/To set/Setting/

Missing the change to the 'query-block' and 'query-named-block-nodes'
examples to show the new always-present output field.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]