[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v5 3/4] block: add block timer and block throttl
From: |
Zhi Yong Wu |
Subject: |
Re: [Qemu-devel] [PATCH v5 3/4] block: add block timer and block throttling algorithm |
Date: |
Fri, 12 Aug 2011 13:00:01 +0800 |
On Wed, Aug 10, 2011 at 7:00 PM, Stefan Hajnoczi <address@hidden> wrote:
> On Wed, Aug 10, 2011 at 7:57 AM, Zhi Yong Wu <address@hidden> wrote:
>> On Tue, Aug 9, 2011 at 11:19 PM, Stefan Hajnoczi <address@hidden> wrote:
>>> On Tue, Aug 9, 2011 at 5:17 AM, Zhi Yong Wu <address@hidden> wrote:
>>>> Note:
>>>> 1.) When bps/iops limits are specified to a small value such as 511
>>>> bytes/s, this VM will hang up. We are considering how to handle this
>>>> senario.
>>>
>>> If an I/O request is larger than the limit itself then I think we
>>> should let it through with a warning that the limit is too low. This
>> If it will print a waring, it seems to be not a nice way, the
>> throtting function will take no effect on this scenario.
>
> Setting the limit below the max request size is a misconfiguration.
> It's a problem that the user needs to be aware of and fix, so a
> warning is appropriate. Unfortunately the max request size is not
> easy to determine from the block layer and may vary between guest
> OSes, that's why we need to do something at runtime.
>
> We cannot leave this case unhandled because it results in the guest
> timing out I/O without any information about the problem - that makes
> it hard to troubleshoot for the user. Any other ideas on how to
> handle this case?
Can we constrain the io limits specified by the user? When the limits
is smaller than some value such as 1MB/s, one error is provide to the
user and remind he/her that the limits is too small.
>
>>>> @@ -90,6 +100,68 @@ int is_windows_drive(const char *filename)
>>>> }
>>>> #endif
>>>>
>>>> +/* throttling disk I/O limits */
>>>> +void bdrv_io_limits_disable(BlockDriverState *bs)
>>>> +{
>>>> + bs->io_limits_enabled = false;
>>>> + bs->req_from_queue = false;
>>>> +
>>>> + if (bs->block_queue) {
>>>> + qemu_block_queue_flush(bs->block_queue);
>>>> + qemu_del_block_queue(bs->block_queue);
>>>
>>> When you fix the acb lifecycle in block-queue.c this will no longer be
>>> safe. Requests that are dispatched still have acbs that belong to
>> When the request is dispatched, if it is successfully serviced, our
>> acb will been removed.
>
> Yes, that is how the patches work right now. But the acb lifecycle
> that I explained in response to the other patch changes this.
OK.
>
>>> this queue. It is simplest to keep the queue for the lifetime of the
>>> BlockDriverState - it's just a list so it doesn't take up much memory.
>> I more prefer to removing this queue, even though it is simple and
>> harmless to let it exist.
>
> If the existing acbs do not touch their BlockQueue after this point in
> the code, then it will be safe to delete the queue since it will no
> longer be referenced. If you can guarantee this then it's fine to
> keep this code.
I prefer to removing the queue.
>
>>
>>>
>>>> + }
>>>> +
>>>> + if (bs->block_timer) {
>>>> + qemu_del_timer(bs->block_timer);
>>>> + qemu_free_timer(bs->block_timer);
>>>
>>> To prevent double frees:
>>>
>>> bs->block_timer = NULL;
>> Can qemu_free_timer() not ensure that it is NULL? This is not necessary.:)
>
> No, qemu_free_timer() is a function. qemu_free_timer() cannot change
> the pointer variable bs->block_timer because in C functions take
> arguments by value.
>
> After qemu_free_timer() bs->block_timer will still hold the old
> address of the (freed) timer. Then when bdrv_close() is called it
> does qemu_free_timer() again because bs->block_timer is not NULL. It
> is illegal to free() a pointer twice and it can cause a crash.
Done.
>
>>
>>>
>>>> + }
>>>> +
>>>> + bs->slice_start[0] = 0;
>>>> + bs->slice_start[1] = 0;
>>>> +
>>>> + bs->slice_end[0] = 0;
>>>> + bs->slice_end[1] = 0;
>>>> +}
>>>> +
>>>> +static void bdrv_block_timer(void *opaque)
>>>> +{
>>>> + BlockDriverState *bs = opaque;
>>>> + BlockQueue *queue = bs->block_queue;
>>>> +
>>>> + qemu_block_queue_flush(queue);
>>>> +}
>>>> +
>>>> +void bdrv_io_limits_enable(BlockDriverState *bs)
>>>> +{
>>>> + bs->req_from_queue = false;
>>>> +
>>>> + bs->block_queue = qemu_new_block_queue();
>>>> + bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer,
>>>> bs);
>>>> +
>>>> + bs->slice_start[BLOCK_IO_LIMIT_READ] = qemu_get_clock_ns(vm_clock);
>>>> + bs->slice_start[BLOCK_IO_LIMIT_WRITE] = qemu_get_clock_ns(vm_clock);
>>>> +
>>>> + bs->slice_end[BLOCK_IO_LIMIT_READ] =
>>>> + qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;
>>>> + bs->slice_end[BLOCK_IO_LIMIT_WRITE] =
>>>> + qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;
>>>
>>> The slice times could just be initialized to 0 here. When
>>> bdrv_exceed_io_limits() is called it will start a new slice.
>> The result should be same as current method even though they are set to ZERO.
>
> Yes, the result is the same except we only create new slices in one place.
Done.
>
>>>> + }
>>>> +
>>>> + return true;
>>>> + }
>>>> + }
>>>> +
>>>> + elapsed_time = real_time - bs->slice_start[is_write];
>>>> + elapsed_time /= (BLOCK_IO_SLICE_TIME * 10.0);
>>>
>>> Why * 10.0?
>> elapsed_time currently is in ns, and it need to be translated into a
>> floating value in minutes.
>
> I think you mean seconds instead of minutes. Then the conversion has
> nothing to do with BLOCK_IO_SLICE_TIME, it's just nanoseconds to
> seconds. It would be clearer to drop BLOCK_IO_SLICE_TIME from the
> expression and divide by a new NANOSECONDS_PER_SECOND constant
> (1000000000.0).
Done
>
>>>
>>>> +
>>>> + bps_ret = bdrv_exceed_bps_limits(bs, nb_sectors,
>>>> + is_write, elapsed_time, &bps_wait);
>>>> + iops_ret = bdrv_exceed_iops_limits(bs, is_write,
>>>> + elapsed_time, &iops_wait);
>>>> + if (bps_ret || iops_ret) {
>>>> + max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
>>>> + if (wait) {
>>>> + *wait = max_wait;
>>>> + }
>>>> +
>>>> + real_time = qemu_get_clock_ns(vm_clock);
>>>> + if (bs->slice_end[is_write] < real_time + max_wait) {
>>>> + bs->slice_end[is_write] = real_time + max_wait;
>>>> + }
>>>
>>> Why is this necessary? We have already extended the slice at the top
>>> of the function.
>> Here it will adjust this end of slice time based on
>> bdrv_exceed_bps_limits(bdrv_exceed_iops_limits).
>> This will be more accurate.
>
> After rereading it I realized that this extends the slice up to the
> estimated dispatch time (max_wait) and not just by
> BLOCK_IO_SLICE_TIME. This code now makes sense to me: we're trying to
> keep slices around while there are queued requests so that
> iops/throughput history is not forgotten when queued requests are
> dispatched.
Right.
>
>>>
>>>> +
>>>> + return true;
>>>> + }
>>>> +
>>>> + if (wait) {
>>>> + *wait = 0;
>>>> + }
>>>> +
>>>> + return false;
>>>> +}
>>>>
>>>> /**************************************************************/
>>>> /* async I/Os */
>>>> @@ -2121,13 +2398,28 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState
>>>> *bs, int64_t sector_num,
>>>> {
>>>> BlockDriver *drv = bs->drv;
>>>> BlockDriverAIOCB *ret;
>>>> + uint64_t wait_time = 0;
>>>>
>>>> trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
>>>>
>>>> - if (!drv)
>>>> - return NULL;
>>>> - if (bdrv_check_request(bs, sector_num, nb_sectors))
>>>> + if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) {
>>>> + if (bs->io_limits_enabled) {
>>>> + bs->req_from_queue = false;
>>>> + }
>>>> return NULL;
>>>> + }
>>>> +
>>>> + /* throttling disk read I/O */
>>>> + if (bs->io_limits_enabled) {
>>>> + if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>>>> + ret = qemu_block_queue_enqueue(bs->block_queue, bs,
>>>> bdrv_aio_readv,
>>>> + sector_num, qiov, nb_sectors, cb, opaque);
>>>> + qemu_mod_timer(bs->block_timer,
>>>> + wait_time + qemu_get_clock_ns(vm_clock));
>>>
>>> If a guest keeps sending I/O requests, say every millisecond, then we
>>> will delay dispatch forever. In practice we hope the storage
>>> interface (virtio-blk, LSI SCSI, etc) has a resource limit that
>>> prevents this. But the point remains that this delays requests that
>>> should be dispatched for too long. Once the timer has been set we
>>> should not set it again.
>> Can you elaborate this?
>
> Let's wait until the next version of the patch, I've suggested many
> changes and should comment against current code instead of the way I
> think it is going to work :).
OK.
>
> Stefan
>
--
Regards,
Zhi Yong Wu
[Qemu-devel] [PATCH v5 4/4] qmp/hmp: add block_set_io_throttle, Zhi Yong Wu, 2011/08/09
Re: [Qemu-devel] [PATCH v5 0/4] The intro of QEMU block I/O throttling, Stefan Hajnoczi, 2011/08/09