qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notif


From: Wen Congyang
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
Date: Wed, 23 Dec 2015 09:24:17 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

On 12/19/2015 06:02 PM, Markus Armbruster wrote:
> Copying qemu-block because this seems related to generalising block jobs
> to background jobs.
> 
> zhanghailiang <address@hidden> writes:
> 
>> If some errors happen during VM's COLO FT stage, it's important to notify 
>> the users
>> of this event. Together with 'colo_lost_heartbeat', users can intervene in 
>> COLO's
>> failover work immediately.
>> If users don't want to get involved in COLO's failover verdict,
>> it is still necessary to notify users that we exited COLO mode.
>>
>> Cc: Markus Armbruster <address@hidden>
>> Cc: Michael Roth <address@hidden>
>> Signed-off-by: zhanghailiang <address@hidden>
>> Signed-off-by: Li Zhijian <address@hidden>
>> ---
>> v11:
>> - Fix several typos found by Eric
>>
>> Signed-off-by: zhanghailiang <address@hidden>
>> ---
>>  docs/qmp-events.txt | 17 +++++++++++++++++
>>  migration/colo.c    | 11 +++++++++++
>>  qapi-schema.json    | 16 ++++++++++++++++
>>  qapi/event.json     | 17 +++++++++++++++++
>>  4 files changed, 61 insertions(+)
>>
>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>> index d2f1ce4..19f68fc 100644
>> --- a/docs/qmp-events.txt
>> +++ b/docs/qmp-events.txt
>> @@ -184,6 +184,23 @@ Example:
>>  Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>  event.
>>  
>> +COLO_EXIT
>> +---------
>> +
>> +Emitted when VM finishes COLO mode due to some errors happening or
>> +at the request of users.
> 
> How would the event's recipient distinguish between "due to error" and
> "at the user's request"?
> 
>> +
>> +Data:
>> +
>> + - "mode": COLO mode, primary or secondary side (json-string)
>> + - "reason":  the exit reason, internal error or external request. 
>> (json-string)
>> + - "error": error message (json-string, operation)
>> +
>> +Example:
>> +
>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
>> +
> 
> Pardon my ignorance again...  Does "VM finishes COLO mode" means have
> some kind of COLO background job, and it just finished for whatever
> reason?
> 
> If yes, this COLO job could be an instance of the general background job
> concept we're trying to grow from the existing block job concept.
> 
> I'm not asking you to rebase your work onto the background job
> infrastructure, not least for the simple reason that it doesn't exist,
> yet.  But I think it would be fruitful to compare your COLO job
> management QMP interface with the one we have for block jobs.  Not only
> may that avoid unnecessary inconsistency, it could also help shape the
> general background job interface.

COLO is not a block job. If live migration is a background jon, COLO
is also a backgroud job.

> 
> Quick overview of the block job QMP interface:
> 
> * Commands to create a job: block-commit, block-stream, drive-mirror,
>   drive-backup.
> 
> * Get information on jobs: query-block-jobs
> 
> * Pause a job: block-job-pause
> 
> * Resume a job: block-job-resume
> 
> * Cancel a job: block-job-cancel
> 
> * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED
> 
> * Block job error event: BLOCK_JOB_ERROR
> 
> * Block job synchronous completion: event BLOCK_JOB_READY and command
>   block-job-complete

What is background job infrastructure? Do you mean implement all the above
interfaces for each background job?

Thanks
Wen Congyang

> 
>>  DEVICE_DELETED
>>  --------------
>>  
>> diff --git a/migration/colo.c b/migration/colo.c
>> index d1dd4e1..d06c14f 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -18,6 +18,7 @@
>>  #include "qemu/error-report.h"
>>  #include "qemu/sockets.h"
>>  #include "migration/failover.h"
>> +#include "qapi-event.h"
>>  
>>  /* colo buffer */
>>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>  out:
>>      if (ret < 0) {
>>          error_report("%s: %s", __func__, strerror(-ret));
>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>> +                                  true, strerror(-ret), NULL);
>> +    } else {
>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, 
>> COLO_EXIT_REASON_REQUEST,
>> +                                  false, NULL, NULL);
>>      }
>>  
>>      qsb_free(buffer);
>> @@ -516,6 +522,11 @@ out:
>>      if (ret < 0) {
>>          error_report("colo incoming thread will exit, detect error: %s",
>>                       strerror(-ret));
>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, 
>> COLO_EXIT_REASON_ERROR,
>> +                                  true, strerror(-ret), NULL);
>> +    } else {
>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, 
>> COLO_EXIT_REASON_REQUEST,
>> +                                  false, NULL, NULL);
>>      }
>>  
>>      if (fb) {
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index feb7d53..f6ecb88 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -778,6 +778,22 @@
>>    'data': [ 'unknown', 'primary', 'secondary'] }
>>  
>>  ##
>> +# @COLOExitReason
>> +#
>> +# The reason for a COLO exit
>> +#
>> +# @unknown: unknown reason
> 
> How can @unknown happen?
> 
>> +#
>> +# @request: COLO exit is due to an external request
>> +#
>> +# @error: COLO exit is due to an internal error
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum': 'COLOExitReason',
>> +  'data': [ 'unknown', 'request', 'error'] }
>> +
>> +##
>>  # @x-colo-lost-heartbeat
>>  #
>>  # Tell qemu that heartbeat is lost, request it to do takeover procedures.
>> diff --git a/qapi/event.json b/qapi/event.json
>> index f0cef01..f63d456 100644
>> --- a/qapi/event.json
>> +++ b/qapi/event.json
>> @@ -255,6 +255,23 @@
>>    'data': {'status': 'MigrationStatus'}}
>>  
>>  ##
>> +# @COLO_EXIT
>> +#
>> +# Emitted when VM finishes COLO mode due to some errors happening or
>> +# at the request of users.
>> +#
>> +# @mode: which COLO mode the VM was in when it exited.
> 
> Can we get 'unknown' here?
> 
>> +#
>> +# @reason: describes the reason for the COLO exit.
> 
> Can we get 'unknown' here?
> 
>> +#
>> +# @error: #optional, error message. Only present on error happening.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'event': 'COLO_EXIT',
>> +  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' 
>> } }
>> +
>> +##
>>  # @ACPI_DEVICE_OST
>>  #
>>  # Emitted when guest executes ACPI _OST method.
> 
> 
> 
> .
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]