qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycl


From: Avi Kivity
Subject: Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem
Date: Sat, 01 Sep 2012 01:31:58 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120717 Thunderbird/14.0

On 08/29/2012 10:49 AM, Jan Kiszka wrote:
> > 
> > Let's experiment with refcounting MemoryRegion.  We can move the entire
> > contents of MemoryRegion to MemoryRegionImpl, add a reference count (to
> > MemoryRegionImpl), and change MemoryRegion to contain a pointer to the
> > refcounted MemoryRegionImpl:
> > 
> > struct MemoryRegion {
> >     struct MemoryRegionImpl *impl;
> > };
> > 
> > struct MemoryRegionImpl {
> >     atomic int refs;
> >     ...
> > };
> > 
> > The memory core can then store MemoryRegion copies (with elevated
> > refcounts) instead of pointers.  Devices can destroy MemoryRegions at
> > any time, the implementation will not go away.  However, what of the
> > opaque stored in MemoryRegionImpl?  It becomes a dangling pointer.
> > 
> > One way out is to add a lock to MemoryRegionImpl.  Dispatch takes the
> > lock, examines the ->enabled member, and bails out if it is cleared. 
> > The (MemoryRegion, not MemoryRegionImpl) destructor also takes the lock,
> > clears ->enabled,  releases the lock, and drops the reference.
>
> That means holding the MemoryRegionImpl lock across the handler call?

Blech.  As you said on the other side of this thread, we must not take a
coarse grained lock within a fine grained one, and
MemoryRegionImpl::lock is as fine as they get.

> > 
> > The advantage to this scheme is that all changes are localized to the
> > memory core, no need for a full sweep.  It is a little complicated, but
> > we may be able to simplify it (or find another one).
>
> May work. We just need to detect if memory region tries to delete itself
> from its handler to prevent the deadlock.

Those types of hacks are fragile.  IMO it just demonstrates what I said
earlier (then tried to disprove with this): if we call an opaque's
method, we must refcount or otherwise lock that opaque.  No way around it.

>
> > 
> >>
> >>>
> >>>> Besides
> >>>> MMIO and PIO dispatching, it will haunt us for file or event handlers,
> >>>> any kind of callbacks etc.
> >>>>
> >>>> Context A                               Context B
> >>>> ---------                               ---------
> >>>>                                         object = lookup()
> >>>> deregister(object)
> >>>> modify(object) -> invalid state
> >>>> ...                                     use(object)
> >>>> modify(object) -> valid state
> >>>> register(object)
> >>>>
> >>>> And with "object" I'm not talking about QOM but any data structure.
> >>>>
> >>>
> >>>
> >>> Context B
> >>> ---------
> >>> rcu_read_lock()
> >>> object = lookup()
> >>> if (object) {
> >>>     ref(object)
> >>> }
> >>> rcu_read_unlock()
> >>>
> >>> use(object)
> >>>
> >>> unref(object)
> >>>
> >>> (substitute locking scheme to conform to taste and/or dietary
> >>> restrictions)   
> >>>
> >>
> >> + wait for refcount(object) == 0 in deregister(object). That's what I'm
> >> proposing.
> > 
> > Consider timer_del() called from a timer callback.  It's often not doable.
>
> This is inherently synchronous already (when working against the same
> alarm timer backend). We can detect this in timer_del and skip waiting,
> as in the above scenario.

It can always be broken.  The timer callback takes the device lock to
update the device.  The device mmio handler, holding the device lock,
takes the timer lock to timer_mod.  Deadlock.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]