qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V3 10/11] vcpu: introduce lockmap


From: liu ping fan
Subject: Re: [Qemu-devel] [PATCH V3 10/11] vcpu: introduce lockmap
Date: Mon, 17 Sep 2012 10:24:40 +0800

On Thu, Sep 13, 2012 at 4:19 PM, Avi Kivity <address@hidden> wrote:
> On 09/13/2012 09:55 AM, liu ping fan wrote:
>> On Tue, Sep 11, 2012 at 8:41 PM, Avi Kivity <address@hidden> wrote:
>>> On 09/11/2012 03:24 PM, Avi Kivity wrote:
>>>> On 09/11/2012 12:57 PM, Jan Kiszka wrote:
>>>>> On 2012-09-11 11:44, liu ping fan wrote:
>>>>>> On Tue, Sep 11, 2012 at 4:35 PM, Avi Kivity <address@hidden> wrote:
>>>>>>> On 09/11/2012 10:51 AM, Liu Ping Fan wrote:
>>>>>>>> From: Liu Ping Fan <address@hidden>
>>>>>>>>
>>>>>>>> The func call chain can suffer from recursively hold
>>>>>>>> qemu_mutex_lock_iothread. We introduce lockmap to record the
>>>>>>>> lock depth.
>>>>>>>
>>>>>>> What is the root cause?  io handlers initiating I/O?
>>>>>>>
>>>>>> cpu_physical_memory_rw() can be called nested, and when called, it can
>>>>>> be protected by no-lock/device lock/ big-lock.
>>>>>> I think without big lock, io-dispatcher should face the same issue.
>>>>>> As to main-loop, have not carefully consider, but at least, dma-helper
>>>>>> will call cpu_physical_memory_rw() with big-lock.
>>>>>
>>>>> That is our core problem: inconsistent invocation of existing services
>>>>> /wrt locking. For portio, I was lucky that there is no nesting and I was
>>>>> able to drop the big lock around all (x86) call sites. But MMIO is way
>>>>> more tricky due to DMA nesting.
>>>>
>>>> Maybe we need to switch to a continuation style.  Instead of expecting
>>>> cpu_physical_memory_rw() to complete synchronously, it becomes an
>>>> asynchronous call and you provide it with a completion.  That means
>>>> devices which use it are forced to drop the lock in between.  Block and
>>>> network clients will be easy to convert since they already use APIs that
>>>> drop the lock (except for accessing the descriptors).
>>>>
>>>>> We could try to introduce a different version of cpu_physical_memory_rw,
>>>>> cpu_physical_memory_rw_unlocked. But the problem remains that an MMIO
>>>>> request can trigger the very same access in a nested fashion, and we
>>>>> will have to detect this to avoid locking up QEMU (locking up the guest
>>>>> might be OK).
>>>>
>>>> An async version of c_p_m_rw() will just cause a completion to bounce
>>>> around, consuming cpu but not deadlocking anything.  If we can keep a
>>>> count of the bounces, we might be able to stall it indefinitely or at
>>>> least ratelimit it.
>>>>
>>>
>>> Another option is to require all users of c_p_m_rw() and related to use
>>> a coroutine or thread.  That makes the programming easier (but still
>>> required a revalidation after the dropped lock).
>>>
>> For the nested cpu_physical_memory_rw(), we change it internal but
>> keep it sync API as it is.  (Wrapper current cpu_physical_memory_rw()
>> into cpu_physical_memory_rw_internal() )
>>
>>
>> LOCK()  // can be device or big lock or both, depend on caller.
>> ..............
>> cpu_physical_memory_rw()
>> {
>>    UNLOCK() //unlock all the locks
>>    queue_work_on_thread(cpu_physical_memory_rw_internal, completion);
>> // cpu_physical_memory_rw_internal can take lock(device,biglock) again
>>    wait_for_completion(completion)
>>    LOCK()
>> }
>> ..................
>> UNLOCK()
>>
>
> This is dangerous.  The caller expects to hold the lock across the call,
> and will not re-validate its state.
>
>> Although cpu_physical_memory_rw_internal() can be freed to use lock,
>> but with precondition. -- We still need to trace lock stack taken by
>> cpu_physical_memory_rw(), so that it can return to caller correctly.
>> Is that OK?
>
> I'm convinced that we need a recursive lock if we don't convert
> everything at once.
>
> So how about:
>
> - make bql a recursive lock (use pthreads, don't invent our own)
> - change c_p_m_rw() calling convention to "caller may hold the BQL, but
> no device lock"
>
> this allows devices to DMA each other.  Unconverted device models will
> work as they used to.  Converted device models will need to drop the
> device lock, re-acquire it, and revalidate any state.  That will cause

I think we are cornered by devices to DMA each other, and it raises
the unavoidable nested lock. Also to avoid deadlock caused by device's
lock sequence, we should drop the current device lock  to acquire
another one.   The only little  diverge is about the "revalidate", do
we need to rollback? Or for converted device, we can just tag it a
busy flag, that is check&set busy flag at the entry of device's
mmio-dispatch.  So  when re-acquire device's lock, the device's state
is intact.

Has anybody other suggestion?

Thanks and regards,
pingfan
> them to become more correct wrt recursive invocation.
>
>
> --
> error compiling committee.c: too many arguments to function



reply via email to

[Prev in Thread] Current Thread [Next in Thread]