qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
Date: Thu, 6 Jan 2011 11:36:12 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Jan 06, 2011 at 05:47:27PM +0900, Yoshiaki Tamura wrote:
> 2011/1/4 Michael S. Tsirkin <address@hidden>:
> > On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote:
> >> 2011/1/4 Michael S. Tsirkin <address@hidden>:
> >> > On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote:
> >> >> 2011/1/4 Michael S. Tsirkin <address@hidden>:
> >> >> > On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
> >> >> >> 2010/11/29 Stefan Hajnoczi <address@hidden>:
> >> >> >> > On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
> >> >> >> > <address@hidden> wrote:
> >> >> >> >> event-tap controls when to start FT transaction, and provides 
> >> >> >> >> proxy
> >> >> >> >> functions to called from net/block devices.  While FT 
> >> >> >> >> transaction, it
> >> >> >> >> queues up net/block requests, and flush them when the transaction 
> >> >> >> >> gets
> >> >> >> >> completed.
> >> >> >> >>
> >> >> >> >> Signed-off-by: Yoshiaki Tamura <address@hidden>
> >> >> >> >> Signed-off-by: OHMURA Kei <address@hidden>
> >> >> >> >> ---
> >> >> >> >>  Makefile.target |    1 +
> >> >> >> >>  block.h         |    9 +
> >> >> >> >>  event-tap.c     |  794 
> >> >> >> >> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> >> >>  event-tap.h     |   34 +++
> >> >> >> >>  net.h           |    4 +
> >> >> >> >>  net/queue.c     |    1 +
> >> >> >> >>  6 files changed, 843 insertions(+), 0 deletions(-)
> >> >> >> >>  create mode 100644 event-tap.c
> >> >> >> >>  create mode 100644 event-tap.h
> >> >> >> >
> >> >> >> > event_tap_state is checked at the beginning of several functions.  
> >> >> >> > If
> >> >> >> > there is an unexpected state the function silently returns.  Should
> >> >> >> > these checks really be assert() so there is an abort and backtrace 
> >> >> >> > if
> >> >> >> > the program ever reaches this state?
> >> >> >> >
> >> >> >> >> +typedef struct EventTapBlkReq {
> >> >> >> >> +    char *device_name;
> >> >> >> >> +    int num_reqs;
> >> >> >> >> +    int num_cbs;
> >> >> >> >> +    bool is_multiwrite;
> >> >> >> >
> >> >> >> > Is multiwrite logging necessary?  If event tap is called from 
> >> >> >> > within
> >> >> >> > the block layer then multiwrite is turned into one or more
> >> >> >> > bdrv_aio_writev() calls.
> >> >> >> >
> >> >> >> >> +static void event_tap_replay(void *opaque, int running, int 
> >> >> >> >> reason)
> >> >> >> >> +{
> >> >> >> >> +    EventTapLog *log, *next;
> >> >> >> >> +
> >> >> >> >> +    if (!running) {
> >> >> >> >> +        return;
> >> >> >> >> +    }
> >> >> >> >> +
> >> >> >> >> +    if (event_tap_state != EVENT_TAP_LOAD) {
> >> >> >> >> +        return;
> >> >> >> >> +    }
> >> >> >> >> +
> >> >> >> >> +    event_tap_state = EVENT_TAP_REPLAY;
> >> >> >> >> +
> >> >> >> >> +    QTAILQ_FOREACH(log, &event_list, node) {
> >> >> >> >> +        EventTapBlkReq *blk_req;
> >> >> >> >> +
> >> >> >> >> +        /* event resume */
> >> >> >> >> +        switch (log->mode & ~EVENT_TAP_TYPE_MASK) {
> >> >> >> >> +        case EVENT_TAP_NET:
> >> >> >> >> +            event_tap_net_flush(&log->net_req);
> >> >> >> >> +            break;
> >> >> >> >> +        case EVENT_TAP_BLK:
> >> >> >> >> +            blk_req = &log->blk_req;
> >> >> >> >> +            if ((log->mode & EVENT_TAP_TYPE_MASK) == 
> >> >> >> >> EVENT_TAP_IOPORT) {
> >> >> >> >> +                switch (log->ioport.index) {
> >> >> >> >> +                case 0:
> >> >> >> >> +                    cpu_outb(log->ioport.address, 
> >> >> >> >> log->ioport.data);
> >> >> >> >> +                    break;
> >> >> >> >> +                case 1:
> >> >> >> >> +                    cpu_outw(log->ioport.address, 
> >> >> >> >> log->ioport.data);
> >> >> >> >> +                    break;
> >> >> >> >> +                case 2:
> >> >> >> >> +                    cpu_outl(log->ioport.address, 
> >> >> >> >> log->ioport.data);
> >> >> >> >> +                    break;
> >> >> >> >> +                }
> >> >> >> >> +            } else {
> >> >> >> >> +                /* EVENT_TAP_MMIO */
> >> >> >> >> +                cpu_physical_memory_rw(log->mmio.address,
> >> >> >> >> +                                       log->mmio.buf,
> >> >> >> >> +                                       log->mmio.len, 1);
> >> >> >> >> +            }
> >> >> >> >> +            break;
> >> >> >> >
> >> >> >> > Why are net tx packets replayed at the net level but blk requests 
> >> >> >> > are
> >> >> >> > replayed at the pio/mmio level?
> >> >> >> >
> >> >> >> > I expected everything to replay either as pio/mmio or as net/block.
> >> >> >>
> >> >> >> Stefan,
> >> >> >>
> >> >> >> After doing some heavy load tests, I realized that we have to
> >> >> >> take a hybrid approach to replay for now.  This is because when a
> >> >> >> device moves to the next state (e.g. virtio decreases inuse) is
> >> >> >> different between net and block.  For example, virtio-net
> >> >> >> decreases inuse upon returning from the net layer,
> >> >> >> but virtio-blk
> >> >> >> does that inside of the callback.
> >> >> >
> >> >> > For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
> >> >> > For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
> >> >> > Both are invoked from a callback.
> >> >> >
> >> >> >> If we only use pio/mmio
> >> >> >> replay, even though event-tap tries to replay net requests, some
> >> >> >> get lost because the state has proceeded already.
> >> >> >
> >> >> > It seems that all you need to do to avoid this is to
> >> >> > delay the callback?
> >> >>
> >> >> Yeah, if it's possible.  But if you take a look at virtio-net,
> >> >> you'll see that virtio_push is called immediately after calling
> >> >> qemu_sendv_packet
> >> >> while virtio-blk does that in the callback.
> >> >
> >> > This is only if the packet was sent immediately.
> >> > I was referring to the case where the packet is queued.
> >>
> >> I see.  I usually don't see packets get queued in the net layer.
> >> What would be the effect to devices?  Restraint sending packets?
> >
> > Yes.
> >
> >> >
> >> >> >
> >> >> >> This doesn't
> >> >> >> happen with block, because the state is still old enough to
> >> >> >> replay.  Note that using hybrid approach won't cause duplicated
> >> >> >> requests on the secondary.
> >> >> >
> >> >> > An assumption devices make is that a buffer is unused once
> >> >> > completion callback was invoked. Does this violate that assumption?
> >> >>
> >> >> No, it shouldn't.  In case of net with net layer replay, we copy
> >> >> the content of the requests, and in case of block, because we
> >> >> haven't called the callback yet, the requests remains fresh.
> >> >>
> >> >> Yoshi
> >> >>
> >> >
> >> > Yes, as long as you copy it should be fine.  Maybe it's a good idea for
> >> > event-tap to queue all packets to avoid the copy and avoid the need to
> >> > replay at the net level.
> >>
> >> If queuing works fine for the devices, it seems to be a good
> >> idea.  I think the ordering issue doesn't happen still.
> >>
> >> Yoshi
> >
> > If you replay and both net and pio level, it becomes complex.
> > Maybe it's ok, but certainly harder to reason about.
> 
> Michael,
> 
> It seems queuing at event-tap like in net layer works for devices
> that use qemu_send_packet_async as you suggested.  But for those
> that use qemu_send_packet, we still need to copy the contents
> just like net layer queuing does, and net level replay should be
> kept to handle it.
> Thanks,
> 
> Yoshi

Right. And I think it's fine. What I found confusing was
where both virtio (because avail idx is moved back) and
the net layer replay the packet.


> >
> >> >
> >> >> >
> >> >> > --
> >> >> > MST
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >



reply via email to

[Prev in Thread] Current Thread [Next in Thread]