qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU question: is eventfd not thread safe?


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] QEMU question: is eventfd not thread safe?
Date: Sun, 01 Jul 2012 22:57:29 +1000
User-agent: Mozilla/5.0 (X11; Linux i686; rv:12.0) Gecko/20120428 Thunderbird/12.0.1

On 01/07/12 22:43, Michael S. Tsirkin wrote:
> On Sun, Jul 01, 2012 at 09:06:20PM +1000, Alexey Kardashevskiy wrote:
>> QEMU from qemu.org/master (not qemu-kvm), linux host 3.4, ppc64.
>>
>> Shortly the problem is in the host kernel: closing a file in one thread does 
>> not interrupt select() waiting on the same file description in another 
>> thread.
>>
>> Longer story is:
>> I'll use VFIO as an example as I hit this when I was debugging it but VFIO 
>> itself has nothing to do with the issue.
>>
>> The bug looks like: I start the guest with MSI-capable PCI card passed via 
>> VFIO. The guest does dhclient from .bashrc and this dhclient does not 
>> receive anything till I press any key.
>> I did not see it for a while as I always start the guest and then typed 
>> "dhclient" manually in the guest console.
>>
>> What happens:
>> VFIO initializes INTx eventfd via event_notifier_init() (returns fd=XX) and 
>> qemu_set_fd_handler() before the guest starts. Then QEMU starts the guest 
>> and enters a loop is os_host_main_loop_wait() which stays in select() until 
>> something happens.
>>
>> >From the host kernel prospective, the XX fd was allocated, struct file* 
>> >(P1) with eventfd specific private_data allocated and initialized. select() 
>> >added a file refcounter (called get_file() in __pollwait()) and started 
>> >waiting on file* P1 but not on fd's number XX (what is mmm weird but ok).
>>
>> The guest starts and tries to initialize MSI for the PCI card passed 
>> through. The guest does PCI config write and this write creates a second 
>> thread in QEMU.
> 
> Why does this create a thread btw?


It is the thread where the guest is running I guess (?). This is the backtrace 
of this second thread:

(gdb) bt
#0  vfio_enable_msi (vdev=0x110200f0) at 
/home/aik/qemu-impreza/hw/ppc/../vfio_pci.c:699
#1  0x000000001036c948 in vfio_pci_write_config (pdev=0x110200f0, addr=0xd2, 
val=0x81, len=0x2) at /home/aik/qemu-impreza/hw/ppc/../vfio_pci.c:979
#2  0x00000000101b1388 in pci_host_config_write_common (pci_dev=0x110200f0, 
addr=0xd2, limit=0x100, val=0x81, len=0x2)
    at /home/aik/qemu-impreza/hw/pci_host.c:54
#3  0x000000001035d70c in finish_write_pci_config (spapr=0x10efa860, buid=0x2, 
addr=0xd2, size=0x2, val=0x81, rets=0xd64758)
    at /home/aik/qemu-impreza/hw/ppc/../spapr_pci.c:170
#4  0x000000001035d874 in rtas_ibm_write_pci_config (spapr=0x10efa860, 
token=0x2010, nargs=0x5, args=0xd64744, nret=0x1, rets=0xd64758)
    at /home/aik/qemu-impreza/hw/ppc/../spapr_pci.c:194
#5  0x0000000010360bcc in spapr_rtas_call (spapr=0x10efa860, token=0x2010, 
nargs=0x5, args=0xd64744, nret=0x1, rets=0xd64758)
    at /home/aik/qemu-impreza/hw/ppc/../spapr_rtas.c:218
#6  0x0000000010358150 in h_rtas (env=0xfffb733f040, spapr=0x10efa860, 
opcode=0xf000, args=0xfffb7fea030)
    at /home/aik/qemu-impreza/hw/ppc/../spapr_hcall.c:560
#7  0x0000000010358c4c in spapr_hypercall (env=0xfffb733f040, opcode=0xf000, 
args=0xfffb7fea030)
    at /home/aik/qemu-impreza/hw/ppc/../spapr_hcall.c:734
#8  0x00000000103dab2c in kvm_arch_handle_exit (env=0xfffb733f040, 
run=0xfffb7fea000) at /home/aik/qemu-impreza/target-ppc/kvm.c:769
#9  0x00000000103a8a94 in kvm_cpu_exec (env=0xfffb733f040) at 
/home/aik/qemu-impreza/kvm-all.c:1536
#10 0x00000000102ede24 in qemu_kvm_cpu_thread_fn (arg=0xfffb733f040) at 
/home/aik/qemu-impreza/cpus.c:752
#11 0x00000fffb7ab19f0 in .start_thread () from /lib64/libpthread.so.0
#12 0x00000fffb7706774 in .__clone () from /lib64/libc.so.6



> 
>> In this thread QEMU-VFIO closes fd XX which makes the host kernel release 
>> fd=XX, release the corresponding file* P1 (fput() in filp_close()) but this 
>> does not free this P1 as there is select() waiting on it.
> 
> Doesn't qemu remove an fd handler before closing the fd?
> If not that's the bug right there.


QEMU does not literally remove it but it marks the event as deleted and 
actually removes it much later from qemu_iohandler_poll() which is called after 
os_host_main_loop_wait(). Removal is done by calling qemu_set_fd_handler(fd, 
NULL, NULL, vdev);

But even if it did remove it, what would it change?


> 
>> eventfd_release() could wake it up but it is called when its refcounter 
>> becomes 0 and this won't happen till select() interrupts.
>>
>> Doing MSI init stuff, QEMU-VFIO calls the same event_notifier_init() 
>> (returns recycled fd=XX what is correct but confuses) and 
>> qemu_set_fd_handler() which adds a handler but select() does not pick it up.
>> The eventfd() (called by event_notifier_init()) allocates new struct file 
>> *P2 in the kernel, QEMU-VFIO passes this fd=XX to the kernel's VFIO. When 
>> MSI interrupt comes to the host kernel, the VFIO interrupt handler calls 
>> eventfd_signal() on the correct file* P2. However do_select() in the kernel 
>> does not interrupt to deliver eventfd event as it is still waiting on P1 - 
>> nobody interrupted select(). It can only interrupt on stdin events (like 
>> typing keys) and INTx interrupt (which does not happen as MSI is enabled).
>>
>> So we need to sync eventfd()/close() calls in one thread with select() in 
>> another. Below is the patch which simply sends SIGUSR2 to the main thread 
>> (the sample patch is below). Another solution could be adding new IO handler 
>> to wake select() up. Or to do something with the kernel but I am not sure 
>> what - implementing file_operations::flush for eventfd to do wakeup did not 
>> help and I did not dig any further.
>>
>>
>> Does it make sense? What do I miss? What would be the right solution?
>>
>>
>> ---
>>  iohandler.c |    1 +
>>  main-loop.c |   17 +++++++++++++++++
>>  2 files changed, 18 insertions(+)
>>
>> diff --git a/iohandler.c b/iohandler.c
>> index 3c74de6..54f4c48 100644
>> --- a/iohandler.c
>> +++ b/iohandler.c
>> @@ -77,6 +77,7 @@ int qemu_set_fd_handler2(int fd,
>>          ioh->fd_write = fd_write;
>>          ioh->opaque = opaque;
>>          ioh->deleted = 0;
>> +        kill(getpid(), SIGUSR2);
>>      }
>>      return 0;
>>  }
>> diff --git a/main-loop.c b/main-loop.c
>> index eb3b6e6..f65e048 100644
>> --- a/main-loop.c
>> +++ b/main-loop.c
>> @@ -199,10 +199,27 @@ static int qemu_signal_init(void)
>>  }
>>  #endif
>>  
>> +static void sigusr2_print(int signal)
>> +{
>> +}
>> +
>> +static void sigusr2_init(void)
>> +{
>> +    struct sigaction action;
>> +
>> +    memset(&action, 0, sizeof(action));
>> +    sigfillset(&action.sa_mask);
>> +    action.sa_handler = sigusr2_print;
>> +    action.sa_flags = 0;
>> +    sigaction(SIGUSR2, &action, NULL);
>> +}
>> +
>>  int main_loop_init(void)
>>  {
>>      int ret;
>>  
>> +    sigusr2_init();
>> +
>>      qemu_mutex_lock_iothread();
>>      ret = qemu_signal_init();
>>      if (ret) {
>> -- 
>> 1.7.10


-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]