qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Linux kernel polling for QEMU


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Linux kernel polling for QEMU
Date: Wed, 30 Nov 2016 15:10:43 +0000
User-agent: Mutt/1.7.1 (2016-10-04)

On Wed, Nov 30, 2016 at 06:50:09PM +0800, Fam Zheng wrote:
> On Wed, 11/30 09:38, Stefan Hajnoczi wrote:
> > On Wed, Nov 30, 2016 at 01:42:14PM +0800, Fam Zheng wrote:
> > > On Tue, 11/29 20:43, Stefan Hajnoczi wrote:
> > > > On Tue, Nov 29, 2016 at 1:24 PM, Fam Zheng <address@hidden> wrote:
> > > > > On Tue, 11/29 12:17, Paolo Bonzini wrote:
> > > > >> On 29/11/2016 11:32, Fam Zheng wrote:
> > > > >> * it still needs a system call before polling is entered.  Ideally, 
> > > > >> QEMU
> > > > >> could run without any system call while in polling mode.
> > > > >>
> > > > >> Another possibility is to add a system call for 
> > > > >> single_task_running().
> > > > >> It should be simple enough that you can implement it in the vDSO and
> > > > >> avoid a context switch.  There are convenient hooking points in
> > > > >> add_nr_running and sub_nr_running.
> > > > >
> > > > > That sounds good!
> > > > 
> > > > With this solution QEMU can either poll virtqueues or the host kernel
> > > > can poll NIC and storage controller descriptor rings, but not both at
> > > > the same time in one thread.  This is one of the reasons why I think
> > > > exploring polling in the kernel makes more sense.
> > > 
> > > That's true. I have one question though: controller rings are in a 
> > > different
> > > layer in the kernel, I wonder what the syscall interface looks like to ask
> > > kernel to poll both hardware rings and memory locations in the same loop? 
> > > It's
> > > not obvious to me after reading your eventfd patch.
> > 
> > Current descriptor ring polling in select(2)/poll(2) is supported for
> > network sockets.  Take a look at the POLL_BUSY_LOOP flag in
> > fs/select.c:do_poll().  If the .poll() callback sets the flag then it
> > indicates that the fd supports busy loop polling.
> > 
> > The way this is implemented for network sockets is that the socket looks
> > up the napi index and is able to use the NIC driver to poll the rx ring.
> > Then it checks whether the socket's receive queue contains data after
> > the rx ring was processed.
> > 
> > The virtio_net.ko driver supports this interface, for example.  See
> > drivers/net/virtio_net.c:virtnet_busy_poll().
> > 
> > Busy loop polling isn't supported for block I/O yet.  There is currently
> > a completely independent code path for O_DIRECT synchronous I/O where
> > NVMe can poll for request completion.  But it doesn't work together with
> > asynchronous I/O (e.g. Linux AIO using eventfd with select(2)/poll(2)).
> 
> This makes perfect sense now, thanks for the pointers!
> 
> > 
> > > > The disadvantage of the kernel approach is that you must make the
> > > > ppoll(2)/epoll_wait(2) syscall even for polling, and you probably need
> > > > to do eventfd reads afterwards so the minimum event loop iteration
> > > > latency is higher than doing polling in userspace.
> > > 
> > > And userspace drivers powered by dpdk or vfio will still want to do 
> > > polling in
> > > userspace anyway, we may want to take that into account as well.
> > 
> > vfio supports interrupts so it can definitely be integrated with
> > adaptive kernel polling (i.e. poll for a little while and then wait for
> > an interrupt if there was no event).
> > 
> > Does dpdk ever use interrupts?
> 
> Yes, interrupt mode is supported there. For example see the intx/msix init 
> code
> in drivers/net/ixgbe/ixgbe_ethdev.c:ixgbe_dev_start().

If the application wants to poll 100% of the time then userspace polling
is the right solution.  Userspace polling also makes sense when all
event sources can be polled from userspace - it's faster than using
kernel polling.

But I think in adaptive polling + wait use cases or when there are a
mixture of userspace and kernel event sources to poll, then it makes
sense to use kernel polling.

In QEMU we have the latter so I think we need to contribute to kernel
polling.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]