[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH for-2.8] migration: Fix return code of ram_save_
From: |
David Gibson |
Subject: |
Re: [Qemu-devel] [PATCH for-2.8] migration: Fix return code of ram_save_iterate() |
Date: |
Thu, 10 Nov 2016 00:08:55 +1100 |
User-agent: |
Mutt/1.7.1 (2016-10-04) |
On Wed, Nov 09, 2016 at 08:46:34AM +0100, Thomas Huth wrote:
> On 09.11.2016 08:18, Amit Shah wrote:
> > On (Fri) 04 Nov 2016 [14:10:17], Thomas Huth wrote:
> >> qemu_savevm_state_iterate() expects the iterators to return 1
> >> when they are done, and 0 if there is still something left to do.
> >> However, ram_save_iterate() does not obey this rule and returns
> >> the number of saved pages instead. This causes a fatal hang with
> >> ppc64 guests when you run QEMU like this (also works with TCG):
> >
> > "works with" -- does that mean reproduces with?
>
> Yes, that's what I've meant: You can reproduce it with TCG (e.g. running
> on a x86 system), too, there's no need for a real POWER machine with KVM
> here.
>
> >> qemu-img create -f qcow2 /tmp/test.qcow2 1M
> >> qemu-system-ppc64 -nographic -nodefaults -m 256 \
> >> -hda /tmp/test.qcow2 -serial mon:stdio
> >>
> >> ... then switch to the monitor by pressing CTRL-a c and try to
> >> save a snapshot with "savevm test1" for example.
> >>
> >> After the first iteration, ram_save_iterate() always returns 0 here,
> >> so that qemu_savevm_state_iterate() hangs in an endless loop and you
> >> can only "kill -9" the QEMU process.
> >> Fix it by using proper return values in ram_save_iterate().
> >>
> >> Signed-off-by: Thomas Huth <address@hidden>
> >> ---
> >> migration/ram.c | 6 +++---
> >> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index fb9252d..a1c8089 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -1987,7 +1987,7 @@ static int ram_save_iterate(QEMUFile *f, void
> >> *opaque)
> >> int ret;
> >> int i;
> >> int64_t t0;
> >> - int pages_sent = 0;
> >> + int done = 0;
> >>
> >> rcu_read_lock();
> >> if (ram_list.version != last_version) {
> >> @@ -2007,9 +2007,9 @@ static int ram_save_iterate(QEMUFile *f, void
> >> *opaque)
> >> pages = ram_find_and_save_block(f, false, &bytes_transferred);
> >> /* no more pages to sent */
> >> if (pages == 0) {
> >> + done = 1;
> >> break;
> >> }
> >> - pages_sent += pages;
> >> acct_info.iterations++;
> >>
> >> /* we want to check in the 1st loop, just in case it was the 1st
> >> time
> >> @@ -2044,7 +2044,7 @@ static int ram_save_iterate(QEMUFile *f, void
> >> *opaque)
> >> return ret;
> >> }
> >>
> >> - return pages_sent;
> >> + return done;
> >> }
> >
> > I agree with David, we can just remove the return value. The first
> > patch of the series can do that; and this one could become the 2nd
> > patch. Should be OK for the soft freeze.
>
> Sorry, I still did not quite get it - if I'd change the return type of
> ram_save_iterate() and the other iterate functions to "void", how is
> qemu_savevm_state_iterate() supposed to know whether all iterators are
> done or not?
It doesn't - it's return value is, in turn, mostly ignored by the
caller.
On the migration path we already determine whether to proceed or not
based purely on the separate state_pending callbacks.
For the savevm path, we don't really need the iteration phase at all -
we can jump straight to the completion phase, since downtime is not an
issue.
> And other iterators also use negative return values to
> signal errors
Ah.. that's a good point. Possibly we should leave in the negative
codes for errors and just remove all positive return values.
> - should that then be handled via an "Error **" parameter
> instead? ... my gut feeling still says that such a bigger rework (we've
> got to touch all iterators for this!) should rather not be done right in
> the middle of the freeze period...
Yeah the errors could - and probably should - be handled with Error **
instead of return codes, but I also wonder if that's too much for soft
freeze. I guess that's the call of the migration guys.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
Re: [Qemu-devel] [PATCH for-2.8] migration: Fix return code of ram_save_iterate(), Juan Quintela, 2016/11/14