[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC 0/8] ioregionfd introduction
|
From: |
Stefan Hajnoczi |
|
Subject: |
Re: [RFC 0/8] ioregionfd introduction |
|
Date: |
Mon, 14 Feb 2022 14:52:29 +0000 |
On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote:
> This patchset is an RFC version for the ioregionfd implementation
> in QEMU. The kernel patches are to be posted with some fixes as a v4.
>
> For this implementation version 3 of the posted kernel patches was user:
> https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/
>
> The future version will include support for vfio/libvfio-user.
> Please refer to the design discussion here proposed by Stefan:
> https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/
>
> The vfio-user version needed some bug-fixing and it was decided to send
> this for multiprocess first.
>
> The ioregionfd is configured currently trough the command line and each
> ioregionfd represent an object. This allow for easy parsing and does
> not require device/remote object command line option modifications.
>
> The following command line can be used to specify ioregionfd:
> <snip>
> '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
> '-object',
> 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
> '-object',
> 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
Explicit configuration of ioregionfd-object is okay for early
prototyping, but what is the plan for integrating this? I guess
x-remote-object would query the remote device to find out which
ioregionfds need to be registered and the user wouldn't need to specify
ioregionfds on the command-line?
> </snip>
>
> Proxy side of ioregionfd in this version uses only one file descriptor:
> <snip>
> '-device',
> 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()),
> \
> </snip>
This raises the question of the ioregionfd file descriptor lifecycle. In
the end I think it shouldn't be specified on the command-line. Instead
the remote device should create it and pass it to QEMU over the
mpqemu/remote fd?
>
> This is done for RFC version and my though was that next version will
> be for vfio-user, so I have not dedicated much effort to this command
> line options.
>
> The multiprocess messaging protocol was extended to support inquiries
> by the proxy if device has any ioregionfds.
> This RFC implements inquires by proxy about the type of BAR (ioregionfd
> or not) and the type of it (memory/io).
>
> Currently there are few limitations in this version of ioregionfd.
> - one ioregionfd per bar, only full bar size is supported;
> - one file descriptor per device for all of its ioregionfds;
> - each remote device runs fd handler for all its BARs in one IOThread;
> - proxy supports only one fd.
>
> Some of these limitations will be dropped in the future version.
> This RFC is to acquire the feedback/suggestions from the community
> on the general approach.
>
> The quick performance test was done for the remote lsi device with
> ioregionfd and without for both mem BARs (1 and 2) with help
> of the fio tool:
>
> Random R/W:
>
> read IOPS read BW write IOPS write BW
> no ioregionfd 889 3559KiB/s 890 3561KiB/s
> ioregionfd 938 3756KiB/s 939 3757KiB/s
This is extremely slow, even for random I/O. How does this compare to
QEMU running the LSI device without multi-process mode?
> Sequential Read and Sequential Write:
>
> Sequential read Sequential write
> read IOPS read BW write IOPS write BW
>
> no ioregionfd 367k 1434MiB/s 76k 297MiB/s
> ioregionfd 374k 1459MiB/s 77.3k 302MiB/s
It's normal for read and write IOPS to differ, but the read IOPS are
very high. I wonder if caching and read-ahead are hiding the LSI
device's actual performance here.
What are the fio and QEMU command-lines?
In order to benchmark ioregionfd it's best to run a benchmark where the
bottleneck is MMIO/PIO dispatch. Otherwise we're looking at some other
bottleneck (e.g. physical disk I/O performance) and the MMIO/PIO
dispatch cost doesn't affect IOPS significantly.
I suggest trying --blockdev null-co,size=64G,id=null0 as the disk
instead of a file or host block device. The fio block size should be 4k
to minimize the amount of time spent on I/O buffer contents and
iodepth=1 because batching multiple requests with iodepth > 0 hides the
MMIO/PIO dispatch bottleneck.
Stefan
signature.asc
Description: PGP signature
- Re: [RFC 4/8] ioregionfd: Introduce IORegionDFObject type, (continued)
- [RFC 2/8] multiprocess: place RemoteObject definition in a header file, Elena Ufimtseva, 2022/02/08
- [RFC 1/8] ioregionfd: introduce a syscall and memory API, Elena Ufimtseva, 2022/02/08
- [RFC 7/8] multiprocess: add ioregionfd memory region in proxy, Elena Ufimtseva, 2022/02/08
- [RFC 5/8] multiprocess: prepare ioregionfds for remote device, Elena Ufimtseva, 2022/02/08
- [RFC 3/8] ioregionfd: introduce memory API functions, Elena Ufimtseva, 2022/02/08
- [RFC 6/8] multiprocess: add MPQEMU_CMD_BAR_INFO, Elena Ufimtseva, 2022/02/08
- [RFC 8/8] multiprocess: handle ioregionfd commands, Elena Ufimtseva, 2022/02/08
- Re: [RFC 0/8] ioregionfd introduction, Stefan Hajnoczi, 2022/02/09
- Re: [RFC 0/8] ioregionfd introduction,
Stefan Hajnoczi <=