[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH] qemu-img: make convert async
From: |
Peter Lieven |
Subject: |
Re: [Qemu-block] [PATCH] qemu-img: make convert async |
Date: |
Fri, 24 Feb 2017 21:09:07 +0100 |
> Am 24.02.2017 um 16:02 schrieb Kevin Wolf <address@hidden>:
>
> Am 21.02.2017 um 13:29 hat Peter Lieven geschrieben:
>> the convert process is currently completely implemented with sync operations.
>> That means it reads one buffer and then writes it. No parallelism and each
>> sync
>> request takes as long as it takes until it is completed.
>>
>> This can be a big performance hit when the convert process reads and writes
>> to devices which do not benefit from kernel readahead or pagecache.
>> In our environment we heavily have the following two use cases when using
>> qemu-img convert.
>>
>> a) reading from NFS and writing to iSCSI for deploying templates
>> b) reading from iSCSI and writing to NFS for backups
>>
>> In both processes we use libiscsi and libnfs so we have no kernel cache.
>>
>> This patch changes the convert process to work with parallel running
>> coroutines
>> which can significantly improve performance for network storage devices:
>>
>> qemu-img (master)
>> nfs -> iscsi 22.8 secs
>> nfs -> ram 11.7 secs
>> ram -> iscsi 12.3 secs
>>
>> qemu-img-async (8 coroutines, in-order write disabled)
>> nfs -> iscsi 11.0 secs
>> nfs -> ram 10.4 secs
>> ram -> iscsi 9.0 secs
>>
>> This patches introduces 2 new cmdline parameters. The -m parameter to specify
>> the number of coroutines running in parallel (defaults to 8). And the -W
>> paremeter to
>> allow qemu-img to write to the target out of order rather than sequential.
>> This improves
>> performance as the writes do not have to wait for each other to complete.
>>
>> Signed-off-by: Peter Lieven <address@hidden>
>
>> @@ -1563,9 +1581,13 @@ static int convert_read(ImgConvertState *s, int64_t
>> sector_num, int nb_sectors,
>> blk = s->src[s->src_cur];
>> bs_sectors = s->src_sectors[s->src_cur];
>>
>> n = MIN(nb_sectors, bs_sectors - (sector_num - s->src_cur_offset));
>
> convert_co_read() uses global state here (s->src, s->src_cur,
> s->src_cur_offset) while not running under a lock. Are you sure that
> this is correct when some coroutines are operating on the first source
> image, but others are already working on the second one?
You are right that could cause problems. I have to change that to use local
variables.
>
> The same variables are used in convert_iteration_sectors(), but I think
> there it's okay because we're just starting a new request and are
> holding s->lock.
>
>> @@ -1651,12 +1685,122 @@ static int convert_write(ImgConvertState *s,
>> int64_t sector_num, int nb_sectors,
>> return 0;
>> }
>>
>> -static int convert_do_copy(ImgConvertState *s)
>> +static void coroutine_fn convert_co_do_copy(void *opaque)
>> {
>> + ImgConvertState *s = opaque;
>> uint8_t *buf = NULL;
>> - int64_t sector_num, allocated_done;
>> - int ret;
>> - int n;
>> + int ret, i;
>> + int index = -1;
>> +
>> + for (i = 0; i < s->num_coroutines; i++) {
>> + if (s->co[i] == qemu_coroutine_self()) {
>> + index = i;
>> + break;
>> + }
>> + }
>> + assert(index >= 0);
>> +
>> + s->running_coroutines++;
>> + buf = blk_blockalign(s->target, s->buf_sectors * BDRV_SECTOR_SIZE);
>> +
>> + while (1) {
>> + int n;
>> + int64_t sector_num;
>> + enum ImgConvertBlockStatus status;
>> +
>> + qemu_co_mutex_lock(&s->lock);
>> + if (s->ret != -EINPROGRESS || s->sector_num >= s->total_sectors) {
>> + qemu_co_mutex_unlock(&s->lock);
>> + goto out;
>> + }
>> + n = convert_iteration_sectors(s, s->sector_num);
>> + if (n < 0) {
>> + qemu_co_mutex_unlock(&s->lock);
>> + s->ret = n;
>> + goto out;
>> + }
>> + /* safe current sector and allocation status to local variables */
>
> s/safe/save/
>
>> + sector_num = s->sector_num;
>> + status = s->status;
>> + if (!s->min_sparse && s->status == BLK_ZERO) {
>> + n = MIN(n, s->buf_sectors);
>> + }
>> + /* increment global sector counter so that other coroutines can
>> + * already continue reading beyond this request */
>> + s->sector_num += n;
>> + qemu_co_mutex_unlock(&s->lock);
>
> Here we unlock, so after this point we can only access globally valid
> values from s, but not request-specific ones like s->status or
> s->sector_num...
>
>> + if (status == BLK_DATA || (!s->min_sparse && s->status ==
>> BLK_ZERO)) {
>
> ...but s->status is used here...
>
>> + s->allocated_done += n;
>> + qemu_progress_print(100.0 * s->allocated_done /
>> + s->allocated_sectors, 0);
>> + }
>> +
>> + if (status == BLK_DATA) {
>> + ret = convert_co_read(s, sector_num, n, buf);
>> + if (ret < 0) {
>> + error_report("error while reading sector %" PRId64
>> + ": %s", sector_num, strerror(-ret));
>> + s->ret = ret;
>> + goto out;
>> + }
>> + } else if (!s->min_sparse && s->status == BLK_ZERO) {
>
> ...and here.
Right, but thats easy to fix.
>
>> + status = BLK_DATA;
>> + memset(buf, 0x00, n * BDRV_SECTOR_SIZE);
>> + }
>> +
>> + if (s->wr_in_order) {
>> + /* keep writes in order */
>> + while (s->wr_offs != sector_num) {
>> + if (s->ret != -EINPROGRESS) {
>> + goto out;
>> + }
>> + s->wait_sector_num[index] = sector_num;
>> + qemu_coroutine_yield();
>> + }
>> + s->wait_sector_num[index] = -1;
>> + }
>> +
>> + ret = convert_co_write(s, sector_num, n, buf, status);
>> + if (ret < 0) {
>> + error_report("error while writing sector %" PRId64
>> + ": %s", sector_num, strerror(-ret));
>> + s->ret = ret;
>> + goto out;
>> + }
>> +
>> + if (s->wr_in_order) {
>> + /* reenter the coroutine that might have waited
>> + * for this write to complete */
>> + s->wr_offs = sector_num + n;
>> + for (i = 0; i < s->num_coroutines; i++) {
>> + if (s->co[i] && s->wait_sector_num[i] == s->wr_offs) {
>> + /*
>> + * A -> B -> A cannot occur because A has
>> + * s->wait_sector_num[i] == -1 during A -> B. Therefore
>> + * B will never enter A during this time window.
>> + */
>> + qemu_coroutine_enter(s->co[i]);
>> + break;
>> + }
>> + }
>> + }
>> + }
>> +
>> +out:
>> + qemu_vfree(buf);
>> + s->co[index] = NULL;
>> + s->running_coroutines--;
>> + if (!s->running_coroutines && s->ret == -EINPROGRESS) {
>> + /* the convert job finished successfully */
>> + s->ret = 0;
>> + }
>> +}
>> diff --git a/qemu-img.texi b/qemu-img.texi
>> index 174aae3..6715ff3 100644
>> --- a/qemu-img.texi
>> +++ b/qemu-img.texi
>> @@ -137,6 +137,12 @@ Parameters to convert subcommand:
>>
>> @item -n
>> Skip the creation of the target volume
>> address@hidden -m
>> +Number of parallel coroutines for the convert process
>> address@hidden -W
>> +Allow to write out of order to the destination. This is option improves
>> performance,
>> +but is only recommened for preallocated devices like host devices or other
>> +raw block devices.
>> @end table
>>
>> Parameters to dd subcommand:
>> @@ -296,7 +302,7 @@ Error on reading data
>>
>> @end table
>>
>> address@hidden convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-T
>> @var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s
>> @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}]
>> @var{filename} address@hidden [...]] @var{output_filename}
>> address@hidden convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-T
>> @var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s
>> @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-m
>> @var{num_coroutines}] [-W] {[-S @var{sparse_size}] @var{filename}
>> address@hidden [...]] @var{output_filename}
>
> The extra { before -S causes 'make qemu-doc.html' to fail (as reported
> by patchew).
Will fix!
Thank you,
Peter