qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant


From: Juan Quintela
Subject: [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant
Date: Tue, 30 Nov 2010 16:40:41 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

Anthony Liguori <address@hidden> wrote:
> On 11/30/2010 05:56 AM, Juan Quintela wrote:
>> No, I benchmarked against two workloads:
>> a- idle guest (because it was faster to test)
>> b- busy guest (each test takes forever, that is the reason that I tested
>> last).
>>
>> So, I don't agree with that.
>>    
>
> But in both cases, it's a large memory guest where the RSS size is <<<
> than the allocated memory size.  This is simply an unrealistic
> scenario.  Migrating immediately after launch may be common among
> developers but users typically run useful workloads in their guests
> and run migration after the guest has been running for quite some
> time.
>
> So the scenario I'm much more interested in, is trying to migrate a
> 400GB guest after the guest has been doing useful work and has brought
> in most of it's RSS.  Also with a box that big, you're going to be on
> 10gbit.
>
> That's going to introduce a whole other set of problems that this
> series is potentially just going to exacerbate even more.
>
>>> There are three fundamental problems: 1) kvm.ko dirty bit tracking
>>> doesn't scale
>>>      
>> Fully agree, but this patch don't took that.
>>
>>    
>>> 2) we lose flow control information because of the
>>> multiple levels of buffering which means we move more data than we
>>> should move
>>>      
>> Fully agree here, but this is a "massive change" to fix it correctly.
>>    
>
> It's really not massive.
>
>>> 3) migration prevents a guest from executing the device
>>> model because of qemu_mutex.
>>>      
>> This is a different problem.
>>    
>
> No, this is really the fundamental one.

We have more problems on the main_loop.

>>> Those are the problems to fix.
>>>      
>> This still don't fix the stalls on the main_loop.
>>
>> So, you are telling me, there are this list of problems that you need to
>> fix.  They are not enough to fix the problem, and their imply massive
>> changes.
>>
>> In the middle time, everybody in stable and 0.14 is not going to be able
>> to use migration with more than 2GB/4GB guest.
>>    
>
> That's simply not true.  You started this series with a statement that
> migration is broken.

Well, it deppends who you ask.

>   It's not, it works perfectly fine.  Migration
> with 8GB guests work perfectly fine.

It dont' work perfectly fine.  If you have a migrate_maximum_downtime of
30ms and we are stuck for 1s several times, it is broken on my book.
This was an idle guest.

Now move to having a loaded guest (more than we can migrate).  And then
we get stalls of almost a minute where nothing happens.

That is a "very strange" definition of it works perfectly.  In my book
it is nearer the "it is completely broken".

> You've identified a corner case for which we have suboptimal behavior,
> and are now declaring that migration is "totally broken".

I think it is not a corner case, but it deppendes of what your "normal"
case is.

>>>   Sprinkling the code with returns in
>>> semi-random places because it benchmarked well for one particular test
>>> case is something we'll deeply regret down the road.
>>>      
>> This was mean :(
>>    
>
> It wasn't intended to be mean but it is the truth.  We need to
> approach these sort of problems in a more systematic way.  Systematic
> means identifying what the fundamental problems are and fixing them in
> a proper way.

It is not a corner case, it is "always" that we have enough memory.
Basically our bitmap handling code is "exponential" on memory size, so
the bigger the amount of memory the bigger the problems appear.

> Throwing a magic number in the iteration path after which we let the
> main loop run for a little bit is simply not a solution.  You're still
> going to get main loop starvation.

This deppends of how you approach the problem.   We have io_handlers
that take too much time (for various definitios of "too much").

>> There are two returns and one heuristic.
>>
>> - return a) we try to migrate when we know that there is no space,
>>    obvious optimazation/bug (deppends on how to look at it).
>>
>> - return b) we don't need to handle TLB bitmap code for kvm.  I fully
>>    agree that we need to split the bitmaps in something more sensible,
>>    but change is quite invasible, and simple fix works for the while.
>>
>> - heuristic:  if you really think that an io_handler should be able to
>>    stall the main loop for almost 4 seconds, sorry, I don't agree.
>>    
>
> But fundamentally, why would an iteration of migration take 4 seconds?

That is a good question.  It shouldn't.

> This is really my fundamental object, the migration loop should, at
> most take as much time as it takes to fill up an empty socket buffer
> or until it hits the bandwidth limit.

bandwidth limit is not hit with zero pages.

> The bandwidth limit offers a fine-grain control over exactly how long
> it should take.

No if we are not sending data.

> If we're burning excess CPU walking a 100MB bitmap, then let's fix
> that problem.  Stopping every 1MB worth of the bitmap to do other work
> just papers over the real problem (that we're walking 100MB bitmap).

Agreed.

OK.  I am going to state this other way.  We have at least this
problems:
- qemu/kvm bitmap handler
- io_handler takes too much memory
- interface witch kvm
- bandwidth calculation
- zero page optimazations
- qemu_mutex handling.

If I sent a patch that fixes only "some" of those, I am going to be
asked for numbers (because problem would still be there).

Are you telling me that if I sent a patch that fixes any of the
individual problems, but for the user there is still the same problem
(due to the other ones) they are going to be accepted?

I tried to get the minimal set of patches that fixes the "real" problem:
i.e. stalls.  Half of them are obvious and you agreed to accept them.

But accepting them, we haven't fixed the problem.  So, what is the next
step?
- getting the problems fixed one by one: would that patches get
  integrated?  notice that stalls/numbers will not improve until last
  problem is fixed.
- nothing will get integrated until everything is fixed?

My idea was to go the other way around.  Get the minimal fixes that are
needed to fix the real problem, and then change the implementation of
the things that I showed before.  Why, because this way we know that
problem is fixed (not the more elegant solution, but it is fixed), and
we can measure that we are not getting problems back with each change.

Testing each change in isolation don't allow us to measure that we are
not re-introducing the problem.

Later, Juan.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]