qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 02/10] pci-bridge/cxl_downstream: Add a CXL switch downstream


From: Thomas Huth
Subject: Re: [PULL 02/10] pci-bridge/cxl_downstream: Add a CXL switch downstream port
Date: Wed, 7 Dec 2022 14:26:32 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0

On 07/12/2022 14.21, Jonathan Cameron wrote:
On Mon, 05 Dec 2022 14:59:39 +0000
Alex Bennée <alex.bennee@linaro.org> wrote:

Jonathan Cameron via <qemu-devel@nongnu.org> writes:

On Mon, 5 Dec 2022 10:54:03 +0000
Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
On Sun, 4 Dec 2022 08:23:55 +0100
Thomas Huth <thuth@redhat.com> wrote:
On 04/11/2022 07.47, Thomas Huth wrote:
On 16/06/2022 18.57, Michael S. Tsirkin wrote:
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Emulation of a simple CXL Switch downstream port.
The Device ID has been allocated for this use.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220616145126.8002-3-Jonathan.Cameron@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
   hw/cxl/cxl-host.c              |  43 +++++-
   hw/pci-bridge/cxl_downstream.c | 249 +++++++++++++++++++++++++++++++++
   hw/pci-bridge/meson.build      |   2 +-
   3 files changed, 291 insertions(+), 3 deletions(-)
   create mode 100644 hw/pci-bridge/cxl_downstream.c

   Hi!

There is a memory problem somewhere in this new device. I can make QEMU
crash by running something like this:

$ MALLOC_PERTURB_=59 ./qemu-system-x86_64 -M x-remote \
      -display none -monitor stdio
QEMU 7.1.50 monitor - type 'help' for more information
(qemu) device_add cxl-downstream
./qemu/qom/object.c:1188:5: runtime error: member access within misaligned
address 0x3b3b3b3b3b3b3b3b for type 'struct Object', which requires 8 byte
alignment
0x3b3b3b3b3b3b3b3b: note: pointer points here
<memory cannot be printed>
Bus error (core dumped)

Could you have a look if you've got some spare minutes?

Ping! Jonathan, Michael, any news on this bug?

(this breaks one of my local tests, that's why it's annoying for me)
Sorry, my email filters ate your earlier message.

Looking into this now. I'll note that it also happens on
device_add xio3130-downstream so not specific to this new device.

So far all I've managed to do is track it to something rcu related
as failing in a call to drain_call_rcu() in qmp_device_add()

Will continue digging.

Assuming I'm seeing the same thing...

Problem is g_free() on the PCIBridge windows:
https://elixir.bootlin.com/qemu/latest/source/hw/pci/pci_bridge.c#L235

Is called before we get an rcu_call() to flatview_destroy() as a
result of the final call of flatview_unref() in address_space_set_flatview()
so we get a use after free.

As to what the fix is...  Suggestions welcome!

It sounds like this is the wrong place to free the value then. I guess
the PCI aliases into &w->alias_io() don't get dealt with until RCU
clean-up time.

I *think* using g_free_rcu() should be enough to ensure the free occurs
after the rest of the RCU cleanups but maybe you should only be cleaning
up the windows at device unrealize time? Is this a dynamic piece of
memory which gets updated during the lifetime of the device?

There is unfortunately code that swaps it for an updated structure
in pci_bridge_update_mappings()


If the memory is being cleared with RCU then the access to the base
pointer should be done with the appropriate qatomic_rcu_[set|read]
functions.


I'm annoyingly snowed under this week with other things, but hopefully
can get to in a few days.  Note we are looking at an old problem
here, just one that's happening for an additional device, not sure
if that really affects urgency of fix though.

It's too late now for QEMU 7.2 anyway, so there is no hurry, I think.

 Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]