[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Ad
From: |
Wen Congyang |
Subject: |
Re: [Qemu-block] [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error |
Date: |
Wed, 23 Dec 2015 09:24:17 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 |
On 12/19/2015 06:02 PM, Markus Armbruster wrote:
> Copying qemu-block because this seems related to generalising block jobs
> to background jobs.
>
> zhanghailiang <address@hidden> writes:
>
>> If some errors happen during VM's COLO FT stage, it's important to notify
>> the users
>> of this event. Together with 'colo_lost_heartbeat', users can intervene in
>> COLO's
>> failover work immediately.
>> If users don't want to get involved in COLO's failover verdict,
>> it is still necessary to notify users that we exited COLO mode.
>>
>> Cc: Markus Armbruster <address@hidden>
>> Cc: Michael Roth <address@hidden>
>> Signed-off-by: zhanghailiang <address@hidden>
>> Signed-off-by: Li Zhijian <address@hidden>
>> ---
>> v11:
>> - Fix several typos found by Eric
>>
>> Signed-off-by: zhanghailiang <address@hidden>
>> ---
>> docs/qmp-events.txt | 17 +++++++++++++++++
>> migration/colo.c | 11 +++++++++++
>> qapi-schema.json | 16 ++++++++++++++++
>> qapi/event.json | 17 +++++++++++++++++
>> 4 files changed, 61 insertions(+)
>>
>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>> index d2f1ce4..19f68fc 100644
>> --- a/docs/qmp-events.txt
>> +++ b/docs/qmp-events.txt
>> @@ -184,6 +184,23 @@ Example:
>> Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>> event.
>>
>> +COLO_EXIT
>> +---------
>> +
>> +Emitted when VM finishes COLO mode due to some errors happening or
>> +at the request of users.
>
> How would the event's recipient distinguish between "due to error" and
> "at the user's request"?
>
>> +
>> +Data:
>> +
>> + - "mode": COLO mode, primary or secondary side (json-string)
>> + - "reason": the exit reason, internal error or external request.
>> (json-string)
>> + - "error": error message (json-string, operation)
>> +
>> +Example:
>> +
>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
>> +
>
> Pardon my ignorance again... Does "VM finishes COLO mode" means have
> some kind of COLO background job, and it just finished for whatever
> reason?
>
> If yes, this COLO job could be an instance of the general background job
> concept we're trying to grow from the existing block job concept.
>
> I'm not asking you to rebase your work onto the background job
> infrastructure, not least for the simple reason that it doesn't exist,
> yet. But I think it would be fruitful to compare your COLO job
> management QMP interface with the one we have for block jobs. Not only
> may that avoid unnecessary inconsistency, it could also help shape the
> general background job interface.
COLO is not a block job. If live migration is a background jon, COLO
is also a backgroud job.
>
> Quick overview of the block job QMP interface:
>
> * Commands to create a job: block-commit, block-stream, drive-mirror,
> drive-backup.
>
> * Get information on jobs: query-block-jobs
>
> * Pause a job: block-job-pause
>
> * Resume a job: block-job-resume
>
> * Cancel a job: block-job-cancel
>
> * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED
>
> * Block job error event: BLOCK_JOB_ERROR
>
> * Block job synchronous completion: event BLOCK_JOB_READY and command
> block-job-complete
What is background job infrastructure? Do you mean implement all the above
interfaces for each background job?
Thanks
Wen Congyang
>
>> DEVICE_DELETED
>> --------------
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index d1dd4e1..d06c14f 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -18,6 +18,7 @@
>> #include "qemu/error-report.h"
>> #include "qemu/sockets.h"
>> #include "migration/failover.h"
>> +#include "qapi-event.h"
>>
>> /* colo buffer */
>> #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>> out:
>> if (ret < 0) {
>> error_report("%s: %s", __func__, strerror(-ret));
>> + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>> + true, strerror(-ret), NULL);
>> + } else {
>> + qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
>> COLO_EXIT_REASON_REQUEST,
>> + false, NULL, NULL);
>> }
>>
>> qsb_free(buffer);
>> @@ -516,6 +522,11 @@ out:
>> if (ret < 0) {
>> error_report("colo incoming thread will exit, detect error: %s",
>> strerror(-ret));
>> + qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
>> COLO_EXIT_REASON_ERROR,
>> + true, strerror(-ret), NULL);
>> + } else {
>> + qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
>> COLO_EXIT_REASON_REQUEST,
>> + false, NULL, NULL);
>> }
>>
>> if (fb) {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index feb7d53..f6ecb88 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -778,6 +778,22 @@
>> 'data': [ 'unknown', 'primary', 'secondary'] }
>>
>> ##
>> +# @COLOExitReason
>> +#
>> +# The reason for a COLO exit
>> +#
>> +# @unknown: unknown reason
>
> How can @unknown happen?
>
>> +#
>> +# @request: COLO exit is due to an external request
>> +#
>> +# @error: COLO exit is due to an internal error
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum': 'COLOExitReason',
>> + 'data': [ 'unknown', 'request', 'error'] }
>> +
>> +##
>> # @x-colo-lost-heartbeat
>> #
>> # Tell qemu that heartbeat is lost, request it to do takeover procedures.
>> diff --git a/qapi/event.json b/qapi/event.json
>> index f0cef01..f63d456 100644
>> --- a/qapi/event.json
>> +++ b/qapi/event.json
>> @@ -255,6 +255,23 @@
>> 'data': {'status': 'MigrationStatus'}}
>>
>> ##
>> +# @COLO_EXIT
>> +#
>> +# Emitted when VM finishes COLO mode due to some errors happening or
>> +# at the request of users.
>> +#
>> +# @mode: which COLO mode the VM was in when it exited.
>
> Can we get 'unknown' here?
>
>> +#
>> +# @reason: describes the reason for the COLO exit.
>
> Can we get 'unknown' here?
>
>> +#
>> +# @error: #optional, error message. Only present on error happening.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'event': 'COLO_EXIT',
>> + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str'
>> } }
>> +
>> +##
>> # @ACPI_DEVICE_OST
>> #
>> # Emitted when guest executes ACPI _OST method.
>
>
>
> .
>