qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Linux kernel polling for QEMU


From: Andrew Jones
Subject: Re: [Qemu-devel] Linux kernel polling for QEMU
Date: Tue, 29 Nov 2016 17:01:23 +0100
User-agent: Mutt/1.6.0.1 (2016-04-01)

On Tue, Nov 29, 2016 at 11:39:44PM +0800, Fam Zheng wrote:
> On Tue, 11/29 16:24, Andrew Jones wrote:
> > On Tue, Nov 29, 2016 at 10:17:46PM +0800, Fam Zheng wrote:
> > > On Tue, 11/29 14:27, Paolo Bonzini wrote:
> > > > 
> > > > 
> > > > On 29/11/2016 14:24, Fam Zheng wrote:
> > > > > On Tue, 11/29 12:17, Paolo Bonzini wrote:
> > > > >>
> > > > >>
> > > > >> On 29/11/2016 11:32, Fam Zheng wrote:
> > > > >>>
> > > > >>> The kernel change will be a new prctl operation (should it be a 
> > > > >>> different
> > > > >>> syscall to extend?) to register a new type of eventfd called "idle 
> > > > >>> eventfd":
> > > > >>>
> > > > >>>     prctl(PR_ADD_IDLE_EVENTFD, int eventfd);
> > > > >>>     prctl(PR_DEL_IDLE_EVENTFD, int eventfd);
> > > > >>>
> > > > >>> It will be notified by kernel each time when the thread's local 
> > > > >>> core has no
> > > > >>> runnable threads (i.e., entering idle state).
> > > > >>>
> > > > >>> QEMU can then add this eventfd to its event loop when it has events 
> > > > >>> to poll, and
> > > > >>> watch virtqueue/linux-aio memory from userspace in the fd handlers. 
> > > > >>>  Effectiely,
> > > > >>> if a ppoll() would have blocked because there are no new events, it 
> > > > >>> could now
> > > > >>> return immediately because of idle_eventfd events, and do the idle 
> > > > >>> polling.
> > > > >>
> > > > >> This has two issues:
> > > > >>
> > > > >> * it only reports the leading edge of single_task_running().  Is it 
> > > > >> also
> > > > >> useful to stop polling on the trailing edge?
> > > > > 
> > > > > QEMU can clear the eventfd right after event firing so I don't think 
> > > > > it is
> > > > > necessary.
> > > > 
> > > > Yes, but how would QEMU know that the eventfd has fired?  It would be
> > > > very expensive to read the eventfd on each iteration of polling.
> > > 
> > > The idea is to ppoll() the eventfd together with other fds (ioeventfd and
> > > linux-aio etc.), and in the handler, call event_notifier_test_and_clear()
> > > followed by a polling loop for some period.
> > > 
> > > Fam
> > > 
> > > > 
> > > > Paolo
> > > > 
> > > > >> * it still needs a system call before polling is entered.  Ideally, 
> > > > >> QEMU
> > > > >> could run without any system call while in polling mode.
> > > > >>
> > > > >> Another possibility is to add a system call for 
> > > > >> single_task_running().
> > > > >> It should be simple enough that you can implement it in the vDSO and
> > > > >> avoid a context switch.  There are convenient hooking points in
> > > > >> add_nr_running and sub_nr_running.
> > > > > 
> > > > > That sounds good!
> > > > > 
> > > > > Fam
> > > > > 
> > >
> > 
> > While we have a ppoll audience, another issue with the current polling
> > is that we can block with an infinite timeout set (-1), and it can
> > actually end up being infinite, i.e. vcpus will never run again. I'm
> > able to exhibit this with kvm-unit-tests.
> 
> I don't understand, why does ppoll() block vcpus? They are in different 
> threads.
> Could you elaborate the kvm-unit-test case?

OK, it may be due to scheduling then. Below is the test case (for AArch64)
Also, I forgot to mention before that I can only see this with TCG, not
KVM. If ppoll is allowed to timeout, then the test will complete. If not,
then, as can be seen with strace, the iothread is stuck in ppoll, and the
test never completes.

 #include <asm/smp.h>
 volatile int ready;
 void set_ready(void) {
     ready = 1;
     while(1);
 }
 int main(void) {
     smp_boot_secondary(1, set_ready);
     while (!ready);
     return 0;
 }

Thanks,
drew

> 
> > For these rare cases where
> > no other timeout has been selected, shouldn't we have a default timeout?
> > Anyone want to pick a number? I have a baseless compulsion to use 10 ms...
> 
> Timeout set as -1 means there is abosolutely no event to expect, waking up 
> after
> 10 ms is a waste. It's a bug if guest hangs in this case.
> 
> Fam
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]