[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2 |
Date: |
Mon, 21 May 2012 15:07:37 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 |
Am 21.05.2012 13:02, schrieb Paolo Bonzini:
> Il 21/05/2012 12:32, Kevin Wolf ha scritto:
>> Am 21.05.2012 12:02, schrieb Paolo Bonzini:
>>> Il 21/05/2012 11:29, Kevin Wolf ha scritto:
>>>>> * block-stream: I propose adding two options to the existing
>>>>> block-stream command. If this is rejected, only mirroring will be able
>>>>> to use rerror/werror.
>>>>>
>>>>> The new options are of course rerror/werror. They are enum options,
>>>>> with the following possible values:
>>>>
>>>> Do we really need separate werror/rerror? For guest operations they
>>>> really exist only for historical reasons: werror was there first, and
>>>> when we wanted the same functionality, it seemed odd to overload werror
>>>> to include reads as well.
>>>>
>>>> For block jobs, where there is no such option yet, we could go with a
>>>> single error option, unless there is a use case for separate
>>>> werror/rerror options.
>>>
>>> For mirroring rerror=source and werror=target. I'm not sure there is an
>>> actual usecase, but at least it is more interesting than for devices...
>>
>> Hm. What if we add an active mirror? Then we can get some kind of COW,
>> and rerror can happen on the target as well.
>
> Errors during the read part of COW are always reported as werror.
Good point.
Thinking a bit more about it, with an active mirror (i.e. a filter block
driver) things become a bit less clear anyway. The filter would have to
be linked to the job somehow.
Another interesting question is if we'll want to restrict ourselves to
one job at a time forever. But when we stop doing it, we'll need new
APIs anyway.
>> If source/target is really the distinction we want to have, should the
>> available options be specific to the job type, so that you could have
>> src_error and dst_error for mirroring?
>
> Yes, that would make sense.
Of course, at the same time it also makes the implementation a bit more
complicated.
>>>>> 'stop': The VM *and* the job will be paused---the VM is stopped even if
>>>>> the block device has neither rerror=stop nor werror={stop,enospc}. The
>>>>> error is recorded in the block device's iostatus (which can be examined
>>>>> with query-block). However, a BLOCK_IO_ERROR event will _never_ pause a
>>>>> job.
>>>>>
>>>>> Rationale: stopping all I/O seems to be the best choice in order
>>>>> to limit the number of errors received. However, due to backwards-
>>>>> compatibility with QEMU 1.1 we cannot pause the job when guest-
>>>>> initiated I/O causes an error. We could do that if the block
>>>>> device has rerror=stop/werror={stop,enospc}, but it seems more
>>>>> complicated to just never do it.
>>>>
>>>> I don't agree with stopping the VM. Consider a case where the target is
>>>> somewhere on the network and you lose the connection, but the primary
>>>> image is local on the hard disk. You don't want to stop the VM just
>>>> because continuing with the copy isn't possible for the moment.
>>>
>>> I think this is something that management should resolve.
>>
>> Management doesn't necessarily exist.
>
> Even a human sitting at a console is management. (Though I don't plan
> HMP to expose rerror/werror; so you can assume in some sense that
> management exists).
But it's management that cares about good defaults. :-)
Why not expose the options in HMP?
>>> For an error on the source, stopping the VM makes sense. I don't
>>> think management cares about what caused an I/O error on a device.
>>> Does it matter if streaming was active or rather the guest was
>>> executing "dd if=/dev/sda of=/dev/null".
>>
>> Yes, there's a big difference: If it was a job, the guest can keep
>> running without any problems. If it was a guest operation, we would have
>> to return an error to the guest, which may offline the disk in response.
>
> Ok, this makes sense.
>
>>> Management may want to keep the VM stopped even for an error on the
>>> target, as long as mirroring has finished the initial synchronization
>>> step. The VM can perform large amounts of I/O while the job is paused,
>>> and then completing the job can take a large amount of time.
>>
>> If management wants to limit the impact of this, it can decide to
>> explicitly stop the VM when it receives the error event.
>
> That can be too late.
>
> Eric, is it a problem for libvirt if a pause or target error during
> mirroring causes the job to exit steady state? That means that after a
> target error the offset can go back from 100% to <100%.
"too late" in what respect? With the passive mirror, we already have a
window in which data is on the source, but not copied to the target.
Does it make a big difference if it is a few bytes more or less?
>>>> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going
>>>> on at all. Do we really keep running the jobs in 1.1? If so, this is a
>>>> bug and should be fixed before the release.
>>>
>>> Yes, we do. Do you think it's a problem for migration (thinking more
>>> about it: ouch, yes, it should be)?
>>
>> I'm pretty sure that it is a problem for migration. And it's likely a
>> problem in more cases.
>
> On the other hand, in other cases it can be desirable (qemu -S, run
> streaming before the VM starts).
We would have to verify that the whole qemu code can deal with it. I'm
pretty sure that today it can't and we had a related bug before, even
though I can't remember the details.
>>> I'd rather make the extension of query-block-jobs more generic, with a
>>> list "devices" instead of a member "target", and making up the device
>>> name in the implementation (so you have "device": "target" for mirroring).
>>
>> Well, my idea for blockdev was something like (represented in a -drive
>> syntax because I don't know what it will look like):
>>
>> (qemu) blockdev_add file=foo.img,id=src
>> (qemu) device_add virtio-blk-pci,drive=src
>> ...
>> (qemu) blockdev_add file=bar.img,id=dst
>> (qemu) blockdev_mirror foo bar
>>
>> Once QOM reaches the block layer, I guess we'll want to make all
>> BlockDriverStates user visible anyway.
>
> I don't disagree, but that's very different from what is done with
> drive-mirror.
Yes. Which isn't a problem per se because drive-mirror will be replaced
by blockdev-mirror. However, things like query-block-jobs are probably
going to stay, so they should be designed for the future.
Things like this are why I don't feel overly comfortable with adding
more and more block layer features before we implement -blockdev.
> So for now I'll keep my proposed extension of query-block-jobs; later it
> can be modified so that the target will have a name if you started the
> mirroring with blockdev_mirror instead of drive_mirror.
You mean the same QMP field is a string when the block device was added
with blockdev_add and a dict when it was added with drive_add?
Maintaining this sounds like a nightmare to me.
Kevin
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Stefan Hajnoczi, 2012/05/21
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Luiz Capitulino, 2012/05/21