qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 02/10] pci-bridge/cxl_downstream: Add a CXL switch downstream


From: Alex Bennée
Subject: Re: [PULL 02/10] pci-bridge/cxl_downstream: Add a CXL switch downstream port
Date: Mon, 05 Dec 2022 14:59:39 +0000
User-agent: mu4e 1.9.3; emacs 29.0.60

Jonathan Cameron via <qemu-devel@nongnu.org> writes:

> On Mon, 5 Dec 2022 10:54:03 +0000
> Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
>
>> On Sun, 4 Dec 2022 08:23:55 +0100
>> Thomas Huth <thuth@redhat.com> wrote:
>> 
>> > On 04/11/2022 07.47, Thomas Huth wrote:  
>> > > On 16/06/2022 18.57, Michael S. Tsirkin wrote:    
>> > >> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> > >>
>> > >> Emulation of a simple CXL Switch downstream port.
>> > >> The Device ID has been allocated for this use.
>> > >>
>> > >> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> > >> Message-Id: <20220616145126.8002-3-Jonathan.Cameron@huawei.com>
>> > >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> > >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> > >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> > >> ---
>> > >>   hw/cxl/cxl-host.c              |  43 +++++-
>> > >>   hw/pci-bridge/cxl_downstream.c | 249 +++++++++++++++++++++++++++++++++
>> > >>   hw/pci-bridge/meson.build      |   2 +-
>> > >>   3 files changed, 291 insertions(+), 3 deletions(-)
>> > >>   create mode 100644 hw/pci-bridge/cxl_downstream.c    
>> > > 
>> > >   Hi!
>> > > 
>> > > There is a memory problem somewhere in this new device. I can make QEMU 
>> > > crash by running something like this:
>> > > 
>> > > $ MALLOC_PERTURB_=59 ./qemu-system-x86_64 -M x-remote \
>> > >      -display none -monitor stdio
>> > > QEMU 7.1.50 monitor - type 'help' for more information
>> > > (qemu) device_add cxl-downstream
>> > > ./qemu/qom/object.c:1188:5: runtime error: member access within 
>> > > misaligned 
>> > > address 0x3b3b3b3b3b3b3b3b for type 'struct Object', which requires 8 
>> > > byte 
>> > > alignment
>> > > 0x3b3b3b3b3b3b3b3b: note: pointer points here
>> > > <memory cannot be printed>
>> > > Bus error (core dumped)
>> > > 
>> > > Could you have a look if you've got some spare minutes?    
>> > 
>> > Ping! Jonathan, Michael, any news on this bug?
>> > 
>> > (this breaks one of my local tests, that's why it's annoying for me)  
>> Sorry, my email filters ate your earlier message.
>> 
>> Looking into this now. I'll note that it also happens on
>> device_add xio3130-downstream so not specific to this new device.
>> 
>> So far all I've managed to do is track it to something rcu related
>> as failing in a call to drain_call_rcu() in qmp_device_add()
>> 
>> Will continue digging.
>
> Assuming I'm seeing the same thing...
>
> Problem is g_free() on the PCIBridge windows: 
> https://elixir.bootlin.com/qemu/latest/source/hw/pci/pci_bridge.c#L235
>
> Is called before we get an rcu_call() to flatview_destroy() as a
> result of the final call of flatview_unref() in address_space_set_flatview()
> so we get a use after free.
>
> As to what the fix is...  Suggestions welcome!

It sounds like this is the wrong place to free the value then. I guess
the PCI aliases into &w->alias_io() don't get dealt with until RCU
clean-up time.

I *think* using g_free_rcu() should be enough to ensure the free occurs
after the rest of the RCU cleanups but maybe you should only be cleaning
up the windows at device unrealize time? Is this a dynamic piece of
memory which gets updated during the lifetime of the device?

If the memory is being cleared with RCU then the access to the base
pointer should be done with the appropriate qatomic_rcu_[set|read]
functions.

-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]