[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v14 00/14] Support blob memory and venus on qemu
From: |
Dmitry Osipenko |
Subject: |
Re: [PATCH v14 00/14] Support blob memory and venus on qemu |
Date: |
Sat, 22 Jun 2024 01:25:32 +0300 |
User-agent: |
Mozilla Thunderbird |
On 6/21/24 11:59, Alex Bennée wrote:
> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>
>> On 6/19/24 20:37, Alex Bennée wrote:
>>> So I've been experimenting with Aarch64 TCG with an Intel backend like
>>> this:
>>>
>>> ./qemu-system-aarch64 \
>>> -M virt -cpu cortex-a76 \
>>> -device virtio-net-pci,netdev=unet \
>>> -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>> -m 8192 \
>>> -object memory-backend-memfd,id=mem,size=8G,share=on \
>>> -serial mon:stdio \
>>> -kernel
>>> ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \
>>> -append "console=ttyAMA0" \
>>> -device qemu-xhci -device usb-kbd -device usb-tablet \
>>> -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \
>>> -display sdl,gl=on -d
>>> plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map
>>>
>>>
>>> And I've noticed a couple of things. First trying to launch vkmark to
>>> run a KMS mode test fails with:
>>>
>> ...
>>> virgl_render_server[1875931]: vkr: failed to import resource: invalid
>>> res_id 5
>>> virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error
>>> virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command
>>> failed
>>>
>>> More interestingly when shutting stuff down we see weirdness like:
>>>
>>> address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1
>>>
>>>
>>> virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a
>>> valid instance
>>>
>>> virgl_render_server[1875931]: vkr: destroying device with valid objects
>>>
>>>
>>> vkr_context_remove_object: -7438602987017907480
>>>
>>>
>>> vkr_context_remove_object: 7
>>>
>>>
>>> vkr_context_remove_object: 5
>>>
>>> which indicates something has gone very wrong. I'm not super familiar
>>> with the memory allocation patterns but should stuff that is done as
>>> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources?
>>
>> This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest
>> that isn't exportable on host. AFAICT, more code changes should be
>> needed to support this case.
>
> There are a lot of acronyms there. If this is pure guest memory why
> isn't it exportable to the host? Or should the underlying mesa library
> be making sure the allocation happens from the shared region?
>
> Is vkmark particularly special here?
Actually, you could get it to work to a some degree if you'll compile
virglrenderer with -Dminigbm_allocation=true. On host use GTK/Wayland
display.
Vkmark isn't special. It's virglrenderer that has a room for
improvement. ChromeOS doesn't use KMS in VMs, proper KMS support was
never a priority for Venus.
>> Note that "destroying device with valid objects" msg is fine, won't hurt
>> to silence it in Venus to avoid confusion. It will happen every time
>> guest application is closed without explicitly releasing every VK
>> object.
>
> I was more concerned with:
>
>>> vkr_context_remove_object: -7438602987017907480
>>>
>>>
>
> which looks like a corruption of the object ids (or maybe an offby one)
At first this appeared to be a valid value, otherwise venus should've
crashed Qemu with a debug-assert if ID was invalid. But I never see such
odd IDs with my testing.
>>> I tried running under RR to further debug but weirdly I can't get
>>> working graphics with that. I did try running under threadsan which
>>> complained about a potential data race:
>>>
>>> vkr_context_add_object: 1 -> 0x7b2c00000288
>>> vkr_context_add_object: 2 -> 0x7b2c00000270
>>> vkr_context_add_object: 3 -> 0x7b3800007f28
>>> vkr_context_add_object: 4 -> 0x7b3800007fa0
>>> vkr_context_add_object: 5 -> 0x7b48000103f8
>>> vkr_context_add_object: 6 -> 0x7b48000104a0
>>> vkr_context_add_object: 7 -> 0x7b4800010440
>>> virtio_gpu_cmd_res_back_attach res 0x5
>>> virtio_gpu_cmd_res_back_attach res 0x6
>>> vkr_context_add_object: 8 -> 0x7b48000103e0
>>> virgl_render_server[1751430]: vkr: failed to import resource: invalid
>>> res_id 5
>>> virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error
>>> virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command
>>> failed
>>> ==================
>>> WARNING: ThreadSanitizer: data race (pid=1751256)
>>> Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0):
>>> #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId:
>>> 0bab171e77cb6782341ee3407e44af7267974025)
>> ..
>>> ==================
>>> SUMMARY: ThreadSanitizer: data race
>>> (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede)
>>> (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy
>>>
>>> This could be a false positive or it could be a race between the guest
>>> kernel clearing memory while we are still doing
>>> virtio_gpu_ctrl_response.
>>>
>>> What do you think?
>>
>> The memcpy warning looks a bit suspicion, but likely is harmless. I
>> don't see such warning with TSAN and x86 VM.
>
> TSAN can only pick up these interactions with TCG guests because it can
> track guest memory accesses. With a KVM guest we have no visibility of
> the guest accesses.
I couldn't reproduce this issue with my KVM/TCG/ARM64 setups. Fox x86 I
checked both KVM and TCG, TSAN only warns about vitio-net memcpy's for me.
--
Best regards,
Dmitry
- [PATCH v14 12/14] virtio-gpu: Handle resource blob commands, (continued)