qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant


From: Anthony Liguori
Subject: [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant
Date: Tue, 30 Nov 2010 08:02:56 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Lightning/1.0b1 Thunderbird/3.0.10

On 11/30/2010 05:56 AM, Juan Quintela wrote:
No, I benchmarked against two workloads:
a- idle guest (because it was faster to test)
b- busy guest (each test takes forever, that is the reason that I tested
last).

So, I don't agree with that.

But in both cases, it's a large memory guest where the RSS size is <<< than the allocated memory size. This is simply an unrealistic scenario. Migrating immediately after launch may be common among developers but users typically run useful workloads in their guests and run migration after the guest has been running for quite some time.

So the scenario I'm much more interested in, is trying to migrate a 400GB guest after the guest has been doing useful work and has brought in most of it's RSS. Also with a box that big, you're going to be on 10gbit.

That's going to introduce a whole other set of problems that this series is potentially just going to exacerbate even more.

There are three fundamental problems: 1) kvm.ko dirty bit tracking
doesn't scale
Fully agree, but this patch don't took that.

2) we lose flow control information because of the
multiple levels of buffering which means we move more data than we
should move
Fully agree here, but this is a "massive change" to fix it correctly.

It's really not massive.

3) migration prevents a guest from executing the device
model because of qemu_mutex.
This is a different problem.

No, this is really the fundamental one.

Those are the problems to fix.
This still don't fix the stalls on the main_loop.

So, you are telling me, there are this list of problems that you need to
fix.  They are not enough to fix the problem, and their imply massive
changes.

In the middle time, everybody in stable and 0.14 is not going to be able
to use migration with more than 2GB/4GB guest.

That's simply not true. You started this series with a statement that migration is broken. It's not, it works perfectly fine. Migration with 8GB guests work perfectly fine.

You've identified a corner case for which we have suboptimal behavior, and are now declaring that migration is "totally broken".

  Sprinkling the code with returns in
semi-random places because it benchmarked well for one particular test
case is something we'll deeply regret down the road.
This was mean :(

It wasn't intended to be mean but it is the truth. We need to approach these sort of problems in a more systematic way. Systematic means identifying what the fundamental problems are and fixing them in a proper way.

Throwing a magic number in the iteration path after which we let the main loop run for a little bit is simply not a solution. You're still going to get main loop starvation.

There are two returns and one heuristic.

- return a) we try to migrate when we know that there is no space,
   obvious optimazation/bug (deppends on how to look at it).

- return b) we don't need to handle TLB bitmap code for kvm.  I fully
   agree that we need to split the bitmaps in something more sensible,
   but change is quite invasible, and simple fix works for the while.

- heuristic:  if you really think that an io_handler should be able to
   stall the main loop for almost 4 seconds, sorry, I don't agree.

But fundamentally, why would an iteration of migration take 4 seconds? This is really my fundamental object, the migration loop should, at most take as much time as it takes to fill up an empty socket buffer or until it hits the bandwidth limit.

The bandwidth limit offers a fine-grain control over exactly how long it should take.

If we're burning excess CPU walking a 100MB bitmap, then let's fix that problem. Stopping every 1MB worth of the bitmap to do other work just papers over the real problem (that we're walking 100MB bitmap).

   Again, we can fix this much better, but it needs lots of changes:
    * I asked you if there is a "maximum" value that one io_handler can
      hog the main loop.  Answer: dunno/deppends.
    * Long term: migration should be its own thread, have I told thread?
      This is qemu, no thread.
    * qemu_mutex: this is the holy grail, if we drop qemu_mutex in
      ram_save_live(), we left the guest cpu's to continue running.
      But, and this is a big but, we still stuck the main_loop for 4
      seconds, so this solution is at least partial.

It really shouldn't be that hard at all to add a multi-level tree to kvm.ko and it fixes a bunch of problems at once. Likewise, accounting zero pages is very simple and it makes bandwidth limiting more effective.

And that is it.  To improve things, I receive complains left and right
that exporting buffered_file_period/timeout is BAD, very BAD.  Because
that exposes buffered_file.c internal state.

But I am given the suggestion that I should just create a
buffered_file_write_nop() that just increases the number of bytes
transferred for normal pages.  I agree that this "could" also work, but
that is indeed worse in the sense that we are exposing yet more
internals of buffered_file.c.  In the 1st case, we are only exposing the
periof of a timer, in the second, we are hijaking how
qemu_file_rate_limit() works + how we account for written memory + how
it calculates the ammount of staff that it have sent, and with this we
"knownly" lie about how much stuff we have write.

How this can be "cleaner" than a timeout of 50ms is beyond me.

If the migration loop is running for 50ms than we've already failed if the user requests a 30ms downtime for migration. The 50ms timeout means that a VCPU can be blocked for 50ms at a time.

Regards,

Anthony Liguori

On the other hand, this is the typical case where you need a lot of
testing for each change that you did.  I thought several times that I
had found the "guilty" bit for the stalls, and no luck, there was yet
another problem.

I also thought at points, ok, I can now drop the previous attempts, and
no, that didn't work either.  This was the minimal set of patches able
to fix the stalls.

This is just very depressing.

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]