qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 02/10] pci-bridge/cxl_downstream: Add a CXL switch downstream


From: Jonathan Cameron
Subject: Re: [PULL 02/10] pci-bridge/cxl_downstream: Add a CXL switch downstream port
Date: Wed, 7 Dec 2022 13:21:19 +0000

On Mon, 05 Dec 2022 14:59:39 +0000
Alex Bennée <alex.bennee@linaro.org> wrote:

> Jonathan Cameron via <qemu-devel@nongnu.org> writes:
> 
> > On Mon, 5 Dec 2022 10:54:03 +0000
> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
> >  
> >> On Sun, 4 Dec 2022 08:23:55 +0100
> >> Thomas Huth <thuth@redhat.com> wrote:
> >>   
> >> > On 04/11/2022 07.47, Thomas Huth wrote:    
> >> > > On 16/06/2022 18.57, Michael S. Tsirkin wrote:      
> >> > >> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> > >>
> >> > >> Emulation of a simple CXL Switch downstream port.
> >> > >> The Device ID has been allocated for this use.
> >> > >>
> >> > >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> > >> Message-Id: <20220616145126.8002-3-Jonathan.Cameron@huawei.com>
> >> > >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> > >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> > >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> > >> ---
> >> > >>   hw/cxl/cxl-host.c              |  43 +++++-
> >> > >>   hw/pci-bridge/cxl_downstream.c | 249 
> >> > >> +++++++++++++++++++++++++++++++++
> >> > >>   hw/pci-bridge/meson.build      |   2 +-
> >> > >>   3 files changed, 291 insertions(+), 3 deletions(-)
> >> > >>   create mode 100644 hw/pci-bridge/cxl_downstream.c      
> >> > > 
> >> > >   Hi!
> >> > > 
> >> > > There is a memory problem somewhere in this new device. I can make 
> >> > > QEMU 
> >> > > crash by running something like this:
> >> > > 
> >> > > $ MALLOC_PERTURB_=59 ./qemu-system-x86_64 -M x-remote \
> >> > >      -display none -monitor stdio
> >> > > QEMU 7.1.50 monitor - type 'help' for more information
> >> > > (qemu) device_add cxl-downstream
> >> > > ./qemu/qom/object.c:1188:5: runtime error: member access within 
> >> > > misaligned 
> >> > > address 0x3b3b3b3b3b3b3b3b for type 'struct Object', which requires 8 
> >> > > byte 
> >> > > alignment
> >> > > 0x3b3b3b3b3b3b3b3b: note: pointer points here
> >> > > <memory cannot be printed>
> >> > > Bus error (core dumped)
> >> > > 
> >> > > Could you have a look if you've got some spare minutes?      
> >> > 
> >> > Ping! Jonathan, Michael, any news on this bug?
> >> > 
> >> > (this breaks one of my local tests, that's why it's annoying for me)    
> >> Sorry, my email filters ate your earlier message.
> >> 
> >> Looking into this now. I'll note that it also happens on
> >> device_add xio3130-downstream so not specific to this new device.
> >> 
> >> So far all I've managed to do is track it to something rcu related
> >> as failing in a call to drain_call_rcu() in qmp_device_add()
> >> 
> >> Will continue digging.  
> >
> > Assuming I'm seeing the same thing...
> >
> > Problem is g_free() on the PCIBridge windows: 
> > https://elixir.bootlin.com/qemu/latest/source/hw/pci/pci_bridge.c#L235
> >
> > Is called before we get an rcu_call() to flatview_destroy() as a
> > result of the final call of flatview_unref() in address_space_set_flatview()
> > so we get a use after free.
> >
> > As to what the fix is...  Suggestions welcome!  
> 
> It sounds like this is the wrong place to free the value then. I guess
> the PCI aliases into &w->alias_io() don't get dealt with until RCU
> clean-up time.
> 
> I *think* using g_free_rcu() should be enough to ensure the free occurs
> after the rest of the RCU cleanups but maybe you should only be cleaning
> up the windows at device unrealize time? Is this a dynamic piece of
> memory which gets updated during the lifetime of the device?

There is unfortunately code that swaps it for an updated structure
in pci_bridge_update_mappings()

> 
> If the memory is being cleared with RCU then the access to the base
> pointer should be done with the appropriate qatomic_rcu_[set|read]
> functions.
>

I'm annoyingly snowed under this week with other things, but hopefully
can get to in a few days.  Note we are looking at an old problem
here, just one that's happening for an additional device, not sure
if that really affects urgency of fix though.

Jonathan
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]