qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parall


From: Peter Lieven
Subject: Re: [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests
Date: Thu, 8 Mar 2018 14:30:25 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0

Am 08.03.2018 um 13:50 schrieb Juan Quintela:
Peter Lieven <address@hidden> wrote:
the current implementation submits up to 512 I/O requests in parallel
which is much to high especially for a background task.
This patch adds a maximum limit of 16 I/O requests that can
be submitted in parallel to avoid monopolizing the I/O device.

Signed-off-by: Peter Lieven <address@hidden>
This code is already a mess (TM).  It is difficult to understand what we
are doing, so not easy to see if your changes are better/worse than what
we have.

I am not sure that your solution help to improve things here.  Let's see
what I understand.

We have three fields (without a single comment):

- submitted: this is the number of blocks that we have asked the block
              device to read asynchronously to main memory, and that
              haven't yet read.  I.e. "blocks_read_pending" would be a
              better name?

- read_done: this is the number of blocks that we have finished read
              asynchronously from this block device.  When we finish, we
              do a submitted -- and a read_done++.  blocks_read_finished
              name?

Correct. The names should be adjusted.


- transferred: This is the number of blocks that we have transferred
                since the beginning of migration.  At this point, we do a
                read_done-- and a transferred++

Note also that we do malloc()/free() for each block

Yes, but that is a different story.


So, now that we have defined what our fields mean, we need to know what
is our test.  block_save_pending():

     get_remaining_dirty() + (submitted + read_done) * BLOCK_SIZE

Looks good.

But let's us see what test we do in block_save_iterate() (Yes, I have
been very liberal with reformatting and removal of struct names):

(submitted + read_done) * BLOCK_SIZE <  qemu_file_get_rate_limit(f) &&
(submitted + read_done) < MAX_INFLIGHT_IO

The idea of that test is to make sure that we _don't send_ through the
QEMUFile more than qemu_file_get_rate_limit(f).  But there are several
things here:
- we have already issued a flush_blks() call before we enter the loop
   And it is inside possibility that we have already sent too much data at
   this point, but we enter the while loop anyways.

   Notice that flush_blks() does the right thing and test in each loop if
   qemu_file_rate_limit() has been reached and stops sending more data if
   it returns true;

- At this point, we *could* have sent all that can be sent for this
   round, but we enter the loop anyways.  And we test two things:
      - that we haven't read from block devices more than
        qemu_file_get_rate_limit() bytes (notice that this is the maximum
        that we could put through the migration channel, not really
        related  with what we read from block devices).

      - that we haven't read in this round more than MAX_INFLIGHT_IO
        blocks.  That is 512 blocks, at BLOCK_SIZE bytes is 512MB. Why?
        Who knows.

The idea behind this was only to limit the maximum memory that is allocated.
That was not meant as a rate limit for the storage backend.


- Now we exit the while loop, and we have pending blocks to send, the
     minimum between:
        - qemu_file_get_rate_limit()/BLOCK_SIZE, and
        - MAX_INFLIGHT_IO

But not all of them are ready to send, only "read_done" blocks are
ready, the "submitted" ones are still waiting for read completion.

Thats what I tried to address in patch 5.


And we call back flush_blks() that will try to send all the "read_done"
blocks through the migration channel until we hit
qemu_file_rate_limit().

So, looking at the problem from far away:

- how many read requests are we have to have in flight at any moment, is
   that 16 from this series the right number? Notice that each request
   are 1MB, so this is 16MB (I have no clue what is the right value).

I choosed that value because it helped to improved the stalls in the
guest that I have been seeing. Stefan also said that 16 would be a good
value for a background task. 512 is definetly too much.


- how many blocks should we get on each round.  Notice that the reason
   for the 1st patch on this series is because the block layer is not
   sending enough blocks to prevent ram migration to start.  If there are
   enough dirty memory sent from the block layer, we shouldn't ever enter
   the ram stage.
   Notice how we register them in vl.c:

     blk_mig_init();
     ram_mig_init();

The idea of the 1st patch was to skip ram migration until we have completed
the first round of block migration (aka bulk phase) as this will take a long 
time.
After that we are only doing incremental updates. You are right this might still
be too early, but it to start after the bulk phase was an easy choice without
introducing another heuristic.


So, after so long mail, do I have some suggestion?

- should we make the MAX_PARALLEL_IO autotunig?  i.e. if we are not
   being able to get qemu_file_get_rate_limit()/BLOCK_SIZE read_blocks by
   iteration, we should increase MAX_PARALLEL_IO limit?

We should not, if the bandwidth of the migration stream is high we would again
monipolize the I/O device and get stalls in guest I/O.


- should we "take" into account how many blocks have transferred the 1st
   call to flush_blks() and only wait for "read_blocks" until
   flush_blks() instead of for the whole set?

We still have to avoid that we have too much parallel I/O. But I was also 
thinking that
we need a time limit (as the ram migration has). We might sit in the while loop 
until
512MB have read if we have a fast source storage and a high migration bandwidth
so that we never reach the rate limit.

Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]