[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH COLO-Frame v11 27/39] COLO failover: Shutdown re
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH COLO-Frame v11 27/39] COLO failover: Shutdown related socket fd when do failover |
Date: |
Thu, 10 Dec 2015 20:03:44 +0000 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
* zhanghailiang (address@hidden) wrote:
> If the net connection between COLO's two sides is broken while colo/colo
> incoming
> thread is blocked in 'read'/'write' socket fd. It will not detect this error
> until
> connect timeout. It will be a long time.
>
> Here we shutdown all the related socket file descriptors to wake up the
> blocking
> operation in failover BH. Besides, we should close the corresponding file
> descriptors
> after failvoer BH shutdown them, or there will be an error.
>
> Signed-off-by: zhanghailiang <address@hidden>
> Signed-off-by: Li Zhijian <address@hidden>
> ---
> v11:
> - Only shutdown fd for once
> ---
> migration/colo.c | 31 +++++++++++++++++++++++++++++--
> 1 file changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 4cd7b00..994b80d 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -68,6 +68,14 @@ static void secondary_vm_do_failover(void)
> /* recover runstate to normal migration finish state */
> autostart = true;
> }
> + /*
> + * Make sure colo incoming thread not block in recv,
> + * mis->from_src_file and mis->to_src_file use the same fd,
> + * so here we only need to shutdown it for once.
> + */
That assumption is only valid for the current socket transport;
for example I have a (partially-working) RDMA transport where the
forward and reverse paths are quite separate.
> + if (mis->from_src_file) {
> + qemu_file_shutdown(mis->from_src_file);
> + }
>
> old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
> FAILOVER_STATUS_COMPLETED);
> @@ -92,6 +100,15 @@ static void primary_vm_do_failover(void)
> MIGRATION_STATUS_COMPLETED);
> }
>
> + /*
> + * Make sure colo thread no block in recv,
> + * Besides, s->rp_state.from_dst_file and s->to_dst_file use the
> + * same fd, so here we only need to shutdown it for once.
> + */
> + if (s->to_dst_file) {
> + qemu_file_shutdown(s->to_dst_file);
> + }
> +
> old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
> FAILOVER_STATUS_COMPLETED);
> if (old_state != FAILOVER_STATUS_HANDLING) {
> @@ -333,7 +350,7 @@ static void colo_process_checkpoint(MigrationState *s)
>
> out:
> current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> - if (ret < 0) {
> + if (ret < 0 || (!ret && !failover_request_is_active())) {
Why is this needed - when would you get a 0 return that is actually an error?
> error_report("%s: %s", __func__, strerror(-ret));
> qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
> true, strerror(-ret), NULL);
> @@ -362,6 +379,11 @@ out:
> qsb_free(buffer);
> buffer = NULL;
>
> + /* Hope this not to be too long to loop here */
> + while (failover_get_state() != FAILOVER_STATUS_COMPLETED) {
> + ;
> + }
> + /* Must be called after failover BH is completed */
Is this the right patch for this?
Dave
> if (s->rp_state.from_dst_file) {
> qemu_fclose(s->rp_state.from_dst_file);
> }
> @@ -534,7 +556,7 @@ void *colo_process_incoming_thread(void *opaque)
>
> out:
> current_time = error_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> - if (ret < 0) {
> + if (ret < 0 || (!ret && !failover_request_is_active())) {
> error_report("colo incoming thread will exit, detect error: %s",
> strerror(-ret));
> qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
> COLO_EXIT_REASON_ERROR,
> @@ -573,6 +595,11 @@ out:
> */
> colo_release_ram_cache();
>
> + /* Hope this not to be too long to loop here */
> + while (failover_get_state() != FAILOVER_STATUS_COMPLETED) {
> + ;
> + }
> + /* Must be called after failover BH is completed */
> if (mis->to_src_file) {
> qemu_fclose(mis->to_src_file);
> }
> --
> 1.8.3.1
>
>
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- Re: [Qemu-devel] [PATCH COLO-Frame v11 27/39] COLO failover: Shutdown related socket fd when do failover,
Dr. David Alan Gilbert <=