[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo st
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status |
Date: |
Wed, 13 Jun 2018 17:50:32 +0100 |
User-agent: |
Mutt/1.10.0 (2018-05-17) |
* Zhang Chen (address@hidden) wrote:
> On Mon, Jun 11, 2018 at 2:48 PM, Markus Armbruster <address@hidden>
> wrote:
>
> > Zhang Chen <address@hidden> writes:
> >
> > > On Thu, Jun 7, 2018 at 8:59 PM, Markus Armbruster <address@hidden>
> > wrote:
> > >
> > >> Zhang Chen <address@hidden> writes:
> > >>
> > >> > Libvirt or other high level software can use this command query colo
> > >> status.
> > >> > You can test this command like that:
> > >> > {'execute':'query-colo-status'}
> > >> >
> > >> > Signed-off-by: Zhang Chen <address@hidden>
> > >> > ---
> > >> > migration/colo.c | 39 +++++++++++++++++++++++++++++++++++++++
> > >> > qapi/migration.json | 34 ++++++++++++++++++++++++++++++++++
> > >> > 2 files changed, 73 insertions(+)
> > >> >
> > >> > diff --git a/migration/colo.c b/migration/colo.c
> > >> > index bedb677788..8c6b8e9a4e 100644
> > >> > --- a/migration/colo.c
> > >> > +++ b/migration/colo.c
> > >> > @@ -29,6 +29,7 @@
> > >> > #include "net/colo.h"
> > >> > #include "block/block.h"
> > >> > #include "qapi/qapi-events-migration.h"
> > >> > +#include "qapi/qmp/qerror.h"
> > >> >
> > >> > static bool vmstate_loading;
> > >> > static Notifier packets_compare_notifier;
> > >> > @@ -237,6 +238,44 @@ void qmp_xen_colo_do_checkpoint(Error **errp)
> > >> > #endif
> > >> > }
> > >> >
> > >> > +COLOStatus *qmp_query_colo_status(Error **errp)
> > >> > +{
> > >> > + int state;
> > >> > + COLOStatus *s = g_new0(COLOStatus, 1);
> > >> > +
> > >> > + s->mode = get_colo_mode();
> > >> > +
> > >> > + switch (s->mode) {
> > >> > + case COLO_MODE_UNKNOWN:
> > >> > + error_setg(errp, "COLO is disabled");
> > >> > + state = MIGRATION_STATUS_NONE;
> > >> > + break;
> > >> > + case COLO_MODE_PRIMARY:
> > >> > + state = migrate_get_current()->state;
> > >> > + break;
> > >> > + case COLO_MODE_SECONDARY:
> > >> > + state = migration_incoming_get_current()->state;
> > >> > + break;
> > >> > + default:
> > >> > + abort();
> > >> > + }
> > >> > +
> > >> > + s->colo_running = state == MIGRATION_STATUS_COLO;
> > >> > +
> > >> > + switch (failover_get_state()) {
> > >> > + case FAILOVER_STATUS_NONE:
> > >> > + s->reason = COLO_EXIT_REASON_NONE;
> > >> > + break;
> > >> > + case FAILOVER_STATUS_REQUIRE:
> > >> > + s->reason = COLO_EXIT_REASON_REQUEST;
> > >> > + break;
> > >> > + default:
> > >> > + s->reason = COLO_EXIT_REASON_ERROR;
> > >> > + }
> > >> > +
> > >> > + return s;
> > >> > +}
> > >> > +
> > >> > static void colo_send_message(QEMUFile *f, COLOMessage msg,
> > >> > Error **errp)
> > >> > {
> > >> > diff --git a/qapi/migration.json b/qapi/migration.json
> > >> > index 93136ce5a0..356a370949 100644
> > >> > --- a/qapi/migration.json
> > >> > +++ b/qapi/migration.json
> > >> > @@ -1231,6 +1231,40 @@
> > >> > ##
> > >> > { 'command': 'xen-colo-do-checkpoint' }
> > >> >
> > >> > +##
> > >> > +# @COLOStatus:
> > >> > +#
> > >> > +# The result format for 'query-colo-status'.
> > >> > +#
> > >> > +# @mode: COLO running mode. If COLO is running, this field will
> > return
> > >> > +# 'primary' or 'secodary'.
> > >> > +#
> > >> > +# @colo-running: true if COLO is running.
> > >> > +#
> > >> > +# @reason: describes the reason for the COLO exit.
> > >>
> > >> What's the value of @reason before a "COLO exit"?
> > >>
> > >
> > > Before a "COLO exit", we just return 'none' in this field.
> >
> > Please add that to the documentation.
> >
>
> OK.
>
>
> >
> > Please excuse my ignorance on COLO... I'm still not sure I fully
> > understand how the three members are related, or even how the COLO state
> > machine works and how its related to / embedded in RunState. I searched
> > docs/ for a state diagram, but couldn't find one.
> >
> > According to runstate_transitions_def[], the part of the RunState state
> > machine that's directly connected to state "colo" looks like this:
> >
> > inmigrate -+
> > |
> > paused ----+
> > |
> > migrate ---+-> colo <------> running
> > |
> > suspended -+
> > |
> > watchdog --+
> >
> > For each of the seven state transitions: how is the state transition
> > triggered (e.g. by QMP command, spontaneously when a certain condition
> > is detected, ...), and what events (if any) are emitted then?
> >
> >
> When you start COLO, the VM always running in "MIGRATION_STATUS_COLO" still
> occur failover.
> And in the flow diagram, you can think COLO always running in migrate state.
> Because into COLO mode, we will control VM state in COLO code itself, for
> example:
> When we start COLO, it will do the first migration as normal live
> migration, after that we will enter
> the COLO process, at that time COLO think the primary VM state is same with
> secondary VM(the first checkpoint),
> so we will use vm_start() start the primary VM(unlike to normal migration)
> and secondary VM.
> In this time, primary VM and secondary VM will parallel running, and if
> COLO found two VM state are
> not same, it will trigger checkpoint(like another migration). Finally, if
> occurred some fault that will trigger
> failover, after that primary VM maybe return to normal running
> mode(secondary dead).
> So, if we just see the primary VM state, may be it has out of the RunState
> state
> machine or it still in migrate state.
>
>
>
>
> > How is @colo-running related to the run state?
> >
>
> Not related, as I say above.
Right; this is a different type of 'running' - it might be better to say
'active' rather than running.
COLO has a pair of VMs in sync with a constant stream of migrations
between them.
The 'mode' is whether it's the source (primary) or destination (secondary) VM.
(Also sometimes written PVM/SVM)
If COLO fails for some reason (e.g. the
secondary host fails) then I think this is saying the 'colo-running'
would be false.
Some monitoring tool would be watching this to make sure you
really do have a redundent pair of VMs, and if one of them failed
you'd want to know and alert.
Dave
> > Which run states are considered to be "before a COLO exit"? If "before
> > a COLO exit" doesn't map to run states, the state machine is too coarse
> > to fully describe COLO, and I'd like to see a suitably refined one.
> >
> >
> COLO just is a special case. It's worthy to refined one?
> CC: "Dr. David Alan Gilbert" <address@hidden>
> Any comments?
>
>
>
> > If @colo-running is true, then @mode is either "primary" or "secondary".
> > What are the possible values when @colo-running is false?
> >
>
> The @mode will in "unknown" state.
>
>
> Thanks
> Zhang Chen
>
>
>
> >
> > [...]
> >
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- Re: [Qemu-devel] [PATCH V8 10/17] qmp event: Add COLO_EXIT event to notify users while exited COLO, (continued)
[Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status, Zhang Chen, 2018/06/03
Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status, Markus Armbruster, 2018/06/07
- Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status, Zhang Chen, 2018/06/10
- Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status, Markus Armbruster, 2018/06/11
- Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status, Zhang Chen, 2018/06/11
- Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status,
Dr. David Alan Gilbert <=
- Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status, Markus Armbruster, 2018/06/14
- Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status, Dr. David Alan Gilbert, 2018/06/14
- Re: [Qemu-devel] [PATCH V8 11/17] qapi: Add new command to query colo status, Zhang Chen, 2018/06/19
[Qemu-devel] [PATCH V8 12/17] savevm: split the process of different stages for loadvm/savevm, Zhang Chen, 2018/06/03
[Qemu-devel] [PATCH V8 13/17] COLO: flush host dirty ram from cache, Zhang Chen, 2018/06/03
[Qemu-devel] [PATCH V8 14/17] filter: Add handle_event method for NetFilterClass, Zhang Chen, 2018/06/03