qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Towards an ivshmem 2.0?


From: Wang, Wei W
Subject: Re: [Qemu-devel] Towards an ivshmem 2.0?
Date: Mon, 23 Jan 2017 03:49:06 +0000

On Saturday, January 21, 2017 12:38 AM, Jan Kiszka wrote:
> On 2017-01-20 12:54, Wang, Wei W wrote:
> > On Tuesday, January 17, 2017 5:46 PM, Jan Kiszka wrote:
> >> On 2017-01-17 10:13, Wang, Wei W wrote:
> >>> Hi Jan,
> >>>
> >>> On Monday, January 16, 2017 9:10 PM, Jan Kiszka wrote:
> >>>> On 2017-01-16 13:41, Marc-André Lureau wrote:
> >>>>> On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka
> >>>>> <address@hidden <mailto:address@hidden>> wrote:
> >>>>>     some of you may know that we are using a shared memory device
> >>>>> similar
> >> to
> >>>>>     ivshmem in the partitioning hypervisor Jailhouse [1].
> >>>>>
> >>>>>     We started as being compatible to the original ivshmem that QEMU
> >>>>>     implements, but we quickly deviated in some details, and in the 
> >>>>> recent
> >>>>>     months even more. Some of the deviations are related to making the
> >>>>>     implementation simpler. The new ivshmem takes <500 LoC - Jailhouse
> is
> >>>>>     aiming at safety critical systems and, therefore, a small code base.
> >>>>>     Other changes address deficits in the original design, like missing
> >>>>>     life-cycle management.
> >>>>>
> >>>>>     Now the question is if there is interest in defining a common new
> >>>>>     revision of this device and maybe also of some protocols used on 
> >>>>> top,
> >>>>>     such as virtual network links. Ideally, this would enable us to 
> >>>>> share
> >>>>>     Linux drivers. We will definitely go for upstreaming at least a 
> >>>>> network
> >>>>>     driver such as [2], a UIO driver and maybe also a serial 
> >>>>> port/console.
> >>>>>
> >>>>>
> >>>>> This sounds like duplicating efforts done with virtio and vhost-pci.
> >>>>> Have you looked at Wei Wang proposal?
> >>>>
> >>>> I didn't follow it recently, but the original concept was about
> >>>> introducing an IOMMU model to the picture, and that's
> >>>> complexity-wise a no-go for us (we can do this whole thing in less
> >>>> than 500 lines, even virtio itself is more complex). IIUC, the
> >>>> alternative to an IOMMU is mapping the whole frontend VM memory
> >>>> into the backend VM -
> >> that's security/safety-wise an absolute no-go.
> >>>
> >>> Though the virtio based solution might be complex for you, a big
> >>> advantage is
> >> that we have lots of people working to improve virtio. For example,
> >> the upcoming virtio 1.1 has vring improvement, we can easily upgrade
> >> all the virtio based solutions, such as vhost-pci, to take advantage
> >> of this improvement. From the long term perspective, I think this kind of
> complexity is worthwhile.
> >>
> >> We will adopt virtio 1.1 ring formats. That's one reason why there is
> >> also still a bidirectional shared memory region: to host the new
> >> descriptors (while keeping the payload safely in the unidirectional 
> >> regions).
> >
> > The vring example I gave might be confusing, sorry about  that. My point is
> that every part of virtio is getting matured and improved from time to time.
> Personally, having a new device developed and maintained in an active and
> popular model is helpful. Also, as new features being gradually added in the
> future, a simple device could become complex.
> 
> We can't afford becoming more complex, that is the whole point.
> Complexity shall go into the guest, not the hypervisor, when it is really 
> needed.
> 
> >
> > Having a theoretical analysis on the performance:
> > The traditional shared memory mechanism, sharing an intermediate memory,
> requires 2 copies to get the packet transmitted. It's not just one more copy
> compared to the 1-copy solution, I think some more things we may need to take
> into account:
> 
> 1-copy (+potential transfers to userspace, but that's the same for
> everyone) is conceptually possible, definitely under stacks like DPDK.
> However, Linux skbs are currently not prepared for picking up shmem-backed
> packets, we already looked into this. Likely addressable, though.

Not sure how difficult it would be to get that change upstream-ed to the 
kernel, but looking forward to seeing your solutions.
 
> > 1) there are extra ring operation overhead  on both the sending and
> > receiving side to access the shared memory (i.e. IVSHMEM);
> > 2) extra protocol to use the shared memory;
> > 3) the piece of allocated shared memory from the host = C(n,2), where n is 
> > the
> number of VMs. Like for 20 VMs who want to talk to each other, there will be
> 190 pieces of memory allocated from the host.
> 
> Well, only if all VMs need to talk to all others directly. On real setups, 
> you would
> add direct links for heavy traffic and otherwise do software switching. 
> Moreover,
> those links would only have to be backed by physical memory in static setups 
> all
> the time.
> 
> Also, we didn't completely rule out a shmem bus with multiple peers connected.
> That's just looking for a strong use case - and then a robust design, of 
> course.
> 
> >
> > That being said, if people really want the 2-copy solution, we can also have
> vhost-pci support it that way as a new feature (not sure if you would be
> interested in collaborating on the project):
> > With the new feature added, the master VM sends only a piece of memory
> (equivalent to IVSHMEM, but allocated by the guest) to the slave over 
> vhost-user
> protocol, and the vhost-pci device on the slave side only hosts that piece of
> shared memory.
> 
> I'm all in for something that allows to strip down vhost-pci to something 
> that -
> while staying secure - is simple and /also/ allows static configurations. But 
> I'm
> not yet seeing that this would still be virtio or vhost-pci.
> 
> What would be the minimal viable vhost-pci device set from your POV?

For the static configuration option, I think it mainly needs the device 
emulation part of the current implementation, which has ~500 LOC currently. 
This would also need to add a new feature to virtio_net, to let the virtio_net 
driver use the IVSHMEM, and same for the vhost-pci device.

I think most part of the vhost-user protocol can be bypassed for this usage, 
because the device feature bits don’t need to be negotiated between the two 
devices, the memory and vring info doesn’t need to be transferred. To support 
interrupt, we may still need vhost-user to share irqfd.

> What would have to be provided by the hypervisor for that?
> 

We don’t need any support from KVM, for the qemu part support, please see above.

Best,
Wei


reply via email to

[Prev in Thread] Current Thread [Next in Thread]