qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 17/45] Add wrappers and handlers for sending/


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH v5 17/45] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
Date: Sat, 28 Mar 2015 16:58:02 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0


On 27/03/2015 05:13, David Gibson wrote:
>> It's a state variable tested and changed by at least two threads
>> and it's got to go through a correct sequence of states. So
>> generally you're doing a 'I expect to be in .... now change to
>> ....' so the exchange works well for that.
> 
> So.. in this case.  It seems what might make sense as a basic 
> atomicity option here is a cmpxchg.  So your state_set function
> takes old_state and new_state.  It will atomically update old_state
> to new_state, returning an error if somebody else changed the state
> away from old_state in the meantime.
> 
> Then you can have a blurb next to the state_set helper explaining
> why you need this atomic op for updates to the state variable from 
> multiple threads.

I agree.

My humble suggestion is:

- document the transitions, see this existing example in thread-pool.c:

    /* Moving state out of THREAD_QUEUED is protected by lock.  After
     * that, only the worker thread can write to it.  Reads and writes
     * of state and ret are ordered with memory barriers.
     */
    enum ThreadState state;

The outgoing migration state fails to document this.  My fault.

- perhaps you can use a pattern similar to what is used for the
outgoing migration state.  There one thread does the work, and the
second (could be "the others", e.g. all VCPU threads) affect the first
just by triggering changes in the migration state.  Transitions in the
first thread use cmpxchg without a loop---if it fails it means the
second thread intervened, and the thread moves on to do whatever the
second thread requested.  Transition in the second thread use cmpxchg
within a loop, in case there was a concurrent change.

Paolo

>>>> + *  n x + *      be64   Page addresses for start of an
>>>> invalidation range + *      be32   mask of 32 pages, '1' to
>>>> discard'
>>> 
>>> Is the extra compactness from this semi-sparse bitmap encoding 
>>> actually worth it?  A simple list of page addresses, or address
>>> ranges to discard would be substantially simpler to get one's
>>> head around, and also seems like it might be more robust
>>> against future implementation changes as a wire format.
>> 
>> As previously discussed I really think it is;  what I'm tending
>> to see when I've been looking at these in debug is something
>> that's sparse but tends to be blobby with sets of pages discarded
>> near by.  However you do this you're going to have to walk this 
>> bitmap and format out some sort of set of messages.
> 
> I'd really need to see some numbers to be convinced.  The
> semi-sparse bitmap introduces a huge amount of code and complexity
> at both ends.
> 
> Remember, using ranges, you can still coalesce adjacent discarded 
> pages.  Even using some compat differential encodings of the range 
> endpoints might work out simpler than the bitmap fragments.
> 
> Plus, ranges handle the host versus target page size distinctions
> very naturally.
> 
> [snip]
>>> DISCARD will typically immediately precede LISTEN, won't it?
>>> Is there a reason not to put the discard data into the LISTEN
>>> command?
>> 
>> Discard data can be quite large, so I potentially send multiple
>> discard commands. (Also as you can tell generally I've got a
>> preference for one message does one thing, and thus I have tried
>> to keep them separate).
> 
> So, I like the idea of one message per action in principle - but
> only if that action really is well-defined without reference to
> what operations come before and after it.  If there are hidden
> dependencies about what actions have to come in what order, I'd
> rather bake that into the command structure.
> 
>>>> +static int
>>>> loadvm_postcopy_handle_advise(MigrationIncomingState *mis, +
>>>> uint64_t remote_hps, +
>>>> uint64_t remote_tps) +{ +    PostcopyState ps =
>>>> postcopy_state_get(mis); +
>>>> trace_loadvm_postcopy_handle_advise(); +    if (ps !=
>>>> POSTCOPY_INCOMING_NONE) { +
>>>> error_report("CMD_POSTCOPY_ADVISE in wrong postcopy state
>>>> (%d)", ps); +        return -1; +    } + +    if (remote_hps
>>>> != getpagesize())  { +        /* +         * Some
>>>> combinations of mismatch are probably possible but it gets +
>>>> * a bit more complicated.  In particular we need to place
>>>> whole +         * host pages on the dest at once, and we need
>>>> to ensure that we +         * handle dirtying to make sure we
>>>> never end up sending part of +         * a hostpage on it's
>>>> own. +         */ +        error_report("Postcopy needs
>>>> matching host page sizes (s=%d d=%d)", +
>>>> (int)remote_hps, getpagesize()); +        return -1; +    } 
>>>> + +    if (remote_tps != (1ul << qemu_target_page_bits())) { 
>>>> +        /* +         * Again, some differences could be
>>>> dealt with, but for now keep it +         * simple. +
>>>> */ +        error_report("Postcopy needs matching target page
>>>> sizes (s=%d d=%d)", +                     (int)remote_tps, 1
>>>> << qemu_target_page_bits()); +        return -1; +    } + +
>>>> postcopy_state_set(mis, POSTCOPY_INCOMING_ADVISE);
>>> 
>>> Should you be checking the return value here to make sure it's
>>> still POSTCOPY_INCOMING_NONE?  Atomic xchgs seem overkill if
>>> you still have a race between the fetch at the top and the set
>>> here.
>>> 
>>> Or, in fact, should you be just doing an atomic
>>> exchange-then-check at the top, rather than checking at the
>>> top, then changing at the bottom.
>> 
>> There's no race at this point yet; going from None->advise we
>> still only have one thread.  The check at the top is a check
>> against protocol violations (e.g. getting two advise or something
>> like that).
> 
> Yeah.. and having to have intimate knowledge of the thread
> structure at each point in order to analyze the atomic operation
> correctness is exactly what bothers me about it.
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]