qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1


From: Kevin Wolf
Subject: Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Date: Mon, 21 May 2012 11:29:29 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1

Am 18.05.2012 19:08, schrieb Paolo Bonzini:
> Hi all,
> 
> the current block job API is designed for streaming; one property of
> streaming is that in case of an error it can be restarted from the point
> where it was left.
> 
> In QEMU 1.2 I would like to add an implementation of mirroring (live
> block copy) based on the block job API and on dirty-block tracking.
> Unlike streaming, this operation is not restartable, because canceling
> the job turns off dirty-block tracking.
> 
> To avoid this problem, my proposal is to add to block jobs options
> similar to rerror/werror.  There are a few more details required to get
> this to work, and the purpose of this email is to summarize these details.

I generally like the proposal. For details, I'll comment inline.

> * block-stream: I propose adding two options to the existing
> block-stream command.  If this is rejected, only mirroring will be able
> to use rerror/werror.
> 
> The new options are of course rerror/werror.  They are enum options,
> with the following possible values:

Do we really need separate werror/rerror? For guest operations they
really exist only for historical reasons: werror was there first, and
when we wanted the same functionality, it seemed odd to overload werror
to include reads as well.

For block jobs, where there is no such option yet, we could go with a
single error option, unless there is a use case for separate
werror/rerror options.

> 'report': The behavior is the same as in 1.1.  An I/O error,
> respectively during a read or a write, will complete the job immediately
> with an error code.
> 
> 'ignore': An I/O error, respectively during a read or a write, will be
> ignored.  For streaming, the job will complete with an error and the
> backing file will be left in place.  For mirroring, the sector will be
> marked again as dirty and re-examined later.

This is not really 'ignore' as used for guest operations. There it means
"no matter what the return value is, the operation has succeeded". For
streaming it would mean that it just goes on with the next cluster (and
if we don't cut the backing file link at the end, it would at least not
corrupt anything).

Just like with guest operations it's a mostly useless mode, do we really
need this option?

> 'stop': The VM *and* the job will be paused---the VM is stopped even if
> the block device has neither rerror=stop nor werror={stop,enospc}.  The
> error is recorded in the block device's iostatus (which can be examined
> with query-block).  However, a BLOCK_IO_ERROR event will _never_ pause a
> job.
> 
>   Rationale: stopping all I/O seems to be the best choice in order
>   to limit the number of errors received.  However, due to backwards-
>   compatibility with QEMU 1.1 we cannot pause the job when guest-
>   initiated I/O causes an error.  We could do that if the block
>   device has rerror=stop/werror={stop,enospc}, but it seems more
>   complicated to just never do it.

I don't agree with stopping the VM. Consider a case where the target is
somewhere on the network and you lose the connection, but the primary
image is local on the hard disk. You don't want to stop the VM just
because continuing with the copy isn't possible for the moment.

Of course, this means that you can't reuse the block device's io_status,
but you need a separate job_iostatus.

If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going
on at all. Do we really keep running the jobs in 1.1? If so, this is a
bug and should be fixed before the release.

> * query-block-jobs: The returned JSON object will grow an additional
> member, "target".  The target field is a dictionary with two fields,
> "info" and "stats" (resembling the output of query-block and
> query-blockstat but for the mirroring target).  Member "device" of the
> BlockInfo structure will be made optional.
> 
>   Rationale: this allows libvirt to observe the high watermark of qcow2
>   mirroring targets, and avoids putting a bad iostatus on a working
>   migration source.

The mirroring target should be present in query-block instead. It is a
user-visible BlockDriverState, so let's treat it like one. We just need
to give it a name.

> * cont: even though cont does _not_ restart the block job that reported
> an error, the iostatus is reset for all block devices that are attached
> to a block job (like the mirroring target).
> 
>   Rationale: cont anyway resets the iostatus for the streaming target
>   or mirroring source, because there is a single iostatus for the
>   device and the job.  It is simpler to do the same also for the
>   mirroring target.

See above, better have a separate iostatus for the job.

> 
> * block-job-resume also resets the iostatus on the mirroring target.
> 
> * block-job-complete: new command specific to mirroring (switches the
> device to the target), not related to the rest of the proposal.

What semantics will block-job-cancel have then for mirroring? Will it be
incompatible with RHEL 6?

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]