qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to h


From: Yoshiaki Tamura
Subject: Re: [Qemu-devel] Re: [PATCH 05/21] virtio: modify save/load handler to handle inuse varialble.
Date: Sun, 26 Dec 2010 19:50:55 +0900

2010/12/26 Michael S. Tsirkin <address@hidden>:
> On Sun, Dec 26, 2010 at 07:14:44PM +0900, Yoshiaki Tamura wrote:
>> 2010/12/26 Michael S. Tsirkin <address@hidden>:
>> > On Fri, Dec 24, 2010 at 08:42:19PM +0900, Yoshiaki Tamura wrote:
>> >> >> If qemu_aio_flush() is responsible for flushing the outstanding
>> >> >> virtio-net requests, I'm wondering why it's a problem for Kemari.
>> >> >> As I described in the previous message, Kemari queues the
>> >> >> requests first.  So in you example above, it should start with
>> >> >>
>> >> >> virtio-net: last_avai_idx 0 inuse 2
>> >> >> event-tap: {A,B}
>> >> >>
>> >> >> As you know, the requests are still in order still because net
>> >> >> layer initiates in order.  Not about completing.
>> >> >>
>> >> >> In the first synchronization, the status above is transferred.  In
>> >> >> the next synchronization, the status will be as following.
>> >> >>
>> >> >> virtio-net: last_avai_idx 1 inuse 1
>> >> >> event-tap: {B}
>> >> >
>> >> > OK, this answers the ordering question.
>> >>
>> >> Glad to hear that!
>> >>
>> >> > Another question: at this point we transfer this status: both
>> >> > event-tap and virtio ring have the command B,
>> >> > so the remote will have:
>> >> >
>> >> > virtio-net: inuse 0
>> >> > event-tap: {B}
>> >> >
>> >> > Is this right? This already seems to be a problem as when B completes
>> >> > inuse will go negative?
>> >>
>> >> I think state above is wrong.  inuse 0 means there shouldn't be
>> >> any requests in event-tap.  Note that the callback is called only
>> >> when event-tap flushes the requests.
>> >>
>> >> > Next it seems that the remote virtio will resubmit B to event-tap. The
>> >> > remote will then have:
>> >> >
>> >> > virtio-net: inuse 1
>> >> > event-tap: {B, B}
>> >> >
>> >> > This looks kind of wrong ... will two packets go out?
>> >>
>> >> No.  Currently, we're just replaying the requests with pio/mmio.
>> >> In the situation above, it should be,
>> >>
>> >> virtio-net: inuse 1
>> >> event-tap: {B}
>> >> >> Why? Because Kemari flushes the first virtio-net request using
>> >> >> qemu_aio_flush() before each synchronization.  If
>> >> >> qemu_aio_flush() doesn't guarantee the order, what you pointed
>> >> >> should be problematic.  So in the final synchronization, the
>> >> >> state should be,
>> >> >>
>> >> >> virtio-net: last_avai_idx 2 inuse 0
>> >> >> event-tap: {}
>> >> >>
>> >> >> where A,B were completed in order.
>> >> >>
>> >> >> Yoshi
>> >> >
>> >> >
>> >> > It might be better to discuss block because that's where
>> >> > requests can complete out of order.
>> >>
>> >> It's same as net.  We queue requests and call bdrv_flush per
>> >> sending requests to the block.  So there shouldn't be any
>> >> inversion.
>> >>
>> >> > So let me see if I understand:
>> >> > - each command passed to event tap is queued by it,
>> >> >  it is not passed directly to the backend
>> >> > - later requests are passed to the backend,
>> >> >  always in the same order that they were submitted
>> >> > - each synchronization point flushes all requests
>> >> >  passed to the backend so far
>> >> > - each synchronization transfers all requests not passed to the backend,
>> >> >  to the remote, and they are replayed there
>> >>
>> >> Correct.
>> >>
>> >> > Now to analyse this for correctness I am looking at the original patch
>> >> > because it is smaller so easier to analyse and I think it is
>> >> > functionally equivalent, correct me if I am wrong in this.
>> >>
>> >> So you think decreasing last_avail_idx upon save is better than
>> >> updating it in the callback?
>> >>
>> >> > So the reason there's no out of order issue is this
>> >> > (and might be a good thing to put in commit log
>> >> > or a comment somewhere):
>> >>
>> >> I've done some in the latest patch.  Please point it out if it
>> >> wasn't enough.
>> >>
>> >> > At point of save callback event tap has flushed commands
>> >> > passed to the backend already. Thus at the point of
>> >> > the save callback if a command has completed
>> >> > all previous commands have been flushed and completed.
>> >> >
>> >> >
>> >> > Therefore inuse is
>> >> > in fact the # of requests passed to event tap but not yet
>> >> > passed to the backend (for non-event tap case all commands are
>> >> > passed to the backend immediately and because of this
>> >> > inuse is 0) and these are the last inuse commands submitted.
>> >> >
>> >> >
>> >> > Right?
>> >>
>> >> Yep.
>> >>
>> >> > Now a question:
>> >> >
>> >> > When we pass last_used_index - inuse to the remote,
>> >> > the remote virtio will resubmit the request.
>> >> > Since request is also passed by event tap, we get
>> >> > the request twice, why is this not a problem?
>> >>
>> >> It's not a problem because event-tap currently replays with
>> >> pio/mmio only, as I mentioned above.  Although event-tap receives
>> >> information about the queued requests, it won't pass it to the
>> >> backend.  The reason is the problem in setting the callbacks
>> >> which are specific to devices on the secondary.  These are
>> >> pointers, and even worse, are usually static functions, which
>> >> event-tap has no way to restore it upon failover.  I do want to
>> >> change event-tap replay to be this way in the future, pio/mmio
>> >> replay is implemented for now.
>> >>
>> >> Thanks,
>> >>
>> >> Yoshi
>> >>
>> >
>> > Then I am still confused, sorry.  inuse != 0 means that some requests
>> > were passed to the backend but did not complete.  I think that if you do
>> > a flush, this waits until all requests passed to the backend will
>> > complete.  Why does not this guarantee inuse = 0 on the origin at the
>> > synchronization point?
>>
>> The synchronization is done before event-tap releases requests to
>> the backend, so there are two types of flush: event-tap and
>> backend block/net.  I assume you're confused with the fact that
>> flushing backend with qemu_aio_flush/bdrv_flush doesn't necessary
>> decrease inuse if event-tap has queued requests because there are
>> no requests passed to the backend.  Let me do a case study again.
>>
>> virtio: inuse 4
>> event-tap: {A,B,C}
>> backend: {D}
>>
>
>
> There are two event-tap devices, right?
> PIO one is above virtio, AIO one is between virtio and backend
> (e.g. bdrv)? Which one is meant here?

Right.  I'm mentioning about the latter, between virtio and
backend.  Note that event-tap function in pio/mmio doesn't queue
but just records what initiated the requests.

Yoshi

>
>
>> synchronization starts.  backend gets flushed.
>>
>> virtio: inuse 3
>> event-tap: {A,B,C}
>> backend: {}
>> synchronization gets done.
>> # secondary is virtio: inuse 3
>>
>> event-tap flushes one request.
>>
>> virtio: inuse 2
>> event-tap: {B,C}
>> backend: {}
>> repeats above and finally it should be,
>>
>> virtio: inuse 0
>> event-tap: {}
>>
>> Hope this helps.
>>
>> Yoshi
>>
>> >
>> > --
>> > MST
>> >
>> >
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to address@hidden
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]