[Qemu-devel] Re: [PATCH 02/10] Add buffered_file

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant

From:	Anthony Liguori
Subject:	[Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant
Date:	Tue, 30 Nov 2010 08:02:56 -0600
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Lightning/1.0b1 Thunderbird/3.0.10

On 11/30/2010 05:56 AM, Juan Quintela wrote:

No, I benchmarked against two workloads:
a- idle guest (because it was faster to test)
b- busy guest (each test takes forever, that is the reason that I tested
last).

So, I don't agree with that.

But in both cases, it's a large memory guest where the RSS size is <<<than the allocated memory size. This is simply an unrealisticscenario. Migrating immediately after launch may be common amongdevelopers but users typically run useful workloads in their guests andrun migration after the guest has been running for quite some time.

So the scenario I'm much more interested in, is trying to migrate a400GB guest after the guest has been doing useful work and has broughtin most of it's RSS. Also with a box that big, you're going to be on10gbit.

That's going to introduce a whole other set of problems that this seriesis potentially just going to exacerbate even more.

There are three fundamental problems: 1) kvm.ko dirty bit tracking
doesn't scale

Fully agree, but this patch don't took that.

2) we lose flow control information because of the
multiple levels of buffering which means we move more data than we
should move

Fully agree here, but this is a "massive change" to fix it correctly.


It's really not massive.

3) migration prevents a guest from executing the device
model because of qemu_mutex.

This is a different problem.


No, this is really the fundamental one.

Those are the problems to fix.

This still don't fix the stalls on the main_loop.

So, you are telling me, there are this list of problems that you need to
fix.  They are not enough to fix the problem, and their imply massive
changes.

In the middle time, everybody in stable and 0.14 is not going to be able
to use migration with more than 2GB/4GB guest.

That's simply not true. You started this series with a statement thatmigration is broken. It's not, it works perfectly fine. Migration with8GB guests work perfectly fine.

You've identified a corner case for which we have suboptimal behavior,and are now declaring that migration is "totally broken".

  Sprinkling the code with returns in
semi-random places because it benchmarked well for one particular test
case is something we'll deeply regret down the road.

This was mean :(

It wasn't intended to be mean but it is the truth. We need to approachthese sort of problems in a more systematic way. Systematic meansidentifying what the fundamental problems are and fixing them in aproper way.

Throwing a magic number in the iteration path after which we let themain loop run for a little bit is simply not a solution. You're stillgoing to get main loop starvation.

There are two returns and one heuristic.

- return a) we try to migrate when we know that there is no space,
   obvious optimazation/bug (deppends on how to look at it).

- return b) we don't need to handle TLB bitmap code for kvm.  I fully
   agree that we need to split the bitmaps in something more sensible,
   but change is quite invasible, and simple fix works for the while.

- heuristic:  if you really think that an io_handler should be able to
   stall the main loop for almost 4 seconds, sorry, I don't agree.

But fundamentally, why would an iteration of migration take 4 seconds?This is really my fundamental object, the migration loop should, at mosttake as much time as it takes to fill up an empty socket buffer or untilit hits the bandwidth limit.

The bandwidth limit offers a fine-grain control over exactly how long itshould take.

If we're burning excess CPU walking a 100MB bitmap, then let's fix thatproblem. Stopping every 1MB worth of the bitmap to do other work justpapers over the real problem (that we're walking 100MB bitmap).

   Again, we can fix this much better, but it needs lots of changes:
    * I asked you if there is a "maximum" value that one io_handler can
      hog the main loop.  Answer: dunno/deppends.
    * Long term: migration should be its own thread, have I told thread?
      This is qemu, no thread.
    * qemu_mutex: this is the holy grail, if we drop qemu_mutex in
      ram_save_live(), we left the guest cpu's to continue running.
      But, and this is a big but, we still stuck the main_loop for 4
      seconds, so this solution is at least partial.

It really shouldn't be that hard at all to add a multi-level tree tokvm.ko and it fixes a bunch of problems at once. Likewise, accountingzero pages is very simple and it makes bandwidth limiting more effective.

And that is it.  To improve things, I receive complains left and right
that exporting buffered_file_period/timeout is BAD, very BAD.  Because
that exposes buffered_file.c internal state.

But I am given the suggestion that I should just create a
buffered_file_write_nop() that just increases the number of bytes
transferred for normal pages.  I agree that this "could" also work, but
that is indeed worse in the sense that we are exposing yet more
internals of buffered_file.c.  In the 1st case, we are only exposing the
periof of a timer, in the second, we are hijaking how
qemu_file_rate_limit() works + how we account for written memory + how
it calculates the ammount of staff that it have sent, and with this we
"knownly" lie about how much stuff we have write.

How this can be "cleaner" than a timeout of 50ms is beyond me.

If the migration loop is running for 50ms than we've already failed ifthe user requests a 30ms downtime for migration. The 50ms timeout meansthat a VCPU can be blocked for 50ms at a time.


Regards,

Anthony Liguori

On the other hand, this is the typical case where you need a lot of
testing for each change that you did.  I thought several times that I
had found the "guilty" bit for the stalls, and no luck, there was yet
another problem.

I also thought at points, ok, I can now drop the previous attempts, and
no, that didn't work either.  This was the minimal set of patches able
to fix the stalls.

This is just very depressing.

Later, Juan.

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 00/10] Fix migration with lots of memory, Juan Quintela, 2010/11/23
- [Qemu-devel] [PATCH 02/10] Add buffered_file_internal constant, Juan Quintela, 2010/11/23
  - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Michael S. Tsirkin, 2010/11/24
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Juan Quintela, 2010/11/24
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Michael S. Tsirkin, 2010/11/24
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Juan Quintela, 2010/11/24
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Michael S. Tsirkin, 2010/11/24
    - Message not available
    - Message not available
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Anthony Liguori <=
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Juan Quintela, 2010/11/30
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Michael S. Tsirkin, 2010/11/30
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Juan Quintela, 2010/11/30
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Anthony Liguori, 2010/11/30
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Juan Quintela, 2010/11/30
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Anthony Liguori, 2010/11/30
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Juan Quintela, 2010/11/30
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Anthony Liguori, 2010/11/30
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Juan Quintela, 2010/11/30
    - [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant, Michael S. Tsirkin, 2010/11/30

Prev by Date: Re: [Qemu-devel] [PATCHv2 2/6] cpus: flush all requests on each vm stop
Next by Date: [Qemu-devel] Re: [PATCHv6 00/16] boot order specification
Previous by thread: [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant
Next by thread: [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant
Index(es):
- Date
- Thread