qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Towards an ivshmem 2.0?


From: Jan Kiszka
Subject: Re: [Qemu-devel] Towards an ivshmem 2.0?
Date: Wed, 25 Jan 2017 10:18:43 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2017-01-23 15:19, Markus Armbruster wrote:
> Jan Kiszka <address@hidden> writes:
> 
>> Hi,
>>
>> some of you may know that we are using a shared memory device similar to
>> ivshmem in the partitioning hypervisor Jailhouse [1].
>>
>> We started as being compatible to the original ivshmem that QEMU
>> implements, but we quickly deviated in some details, and in the recent
>> months even more. Some of the deviations are related to making the
>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
> 
> Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.

That difference comes from remote/migration support and general QEMU
integration - likely not very telling due to the different environments.

> 
>> aiming at safety critical systems and, therefore, a small code base.
>> Other changes address deficits in the original design, like missing
>> life-cycle management.
>>
>> Now the question is if there is interest in defining a common new
>> revision of this device and maybe also of some protocols used on top,
>> such as virtual network links. Ideally, this would enable us to share
>> Linux drivers. We will definitely go for upstreaming at least a network
>> driver such as [2], a UIO driver and maybe also a serial port/console.
>>
>> I've attached a first draft of the specification of our new ivshmem
>> device. A working implementation can be found in the wip/ivshmem2 branch
>> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>>
>> Deviations from the original design:
>>
>> - Only two peers per link
> 
> Uh, define "link".

VMs are linked via a common shared memory. Interrupt delivery follows
that route as well.

> 
>>   This simplifies the implementation and also the interfaces (think of
>>   life-cycle management in a multi-peer environment). Moreover, we do
>>   not have an urgent use case for multiple peers, thus also not
>>   reference for a protocol that could be used in such setups. If someone
>>   else happens to share such a protocol, it would be possible to discuss
>>   potential extensions and their implications.
>>
>> - Side-band registers to discover and configure share memory regions
>>
>>   This was one of the first changes: We removed the memory regions from
>>   the PCI BARs and gave them special configuration space registers. By
>>   now, these registers are embedded in a PCI capability. The reasons are
>>   that Jailhouse does not allow to relocate the regions in guest address
>>   space (but other hypervisors may if they like to) and that we now have
>>   up to three of them.
> 
> I'm afraid I don't quite understand the change, nor the rationale.  I
> guess I could figure out the former by studying the specification.

a) It's a Jailhouse thing (we disallow the guest to move the regions
   around in its address space)
b) With 3 regions + MSI-X + MMIO registers, we run out of BARs (or
   would have to downgrade them to 32 bit)

> 
>> - Changed PCI base class code to 0xff (unspecified class)
> 
> Changed from 0x5 (memory controller).

Right.

> 
>>   This allows us to define our own sub classes and interfaces. That is
>>   now exploited for specifying the shared memory protocol the two
>>   connected peers should use. It also allows the Linux drivers to match
>>   on that.
>>
>> - INTx interrupts support is back
>>
>>   This is needed on target platforms without MSI controllers, i.e.
>>   without the required guest support. Namely some PCI-less ARM SoCs
>>   required the reintroduction. While doing this, we also took care of
>>   keeping the MMIO registers free of privileged controls so that a
>>   guest OS can map them safely into a guest userspace application.
> 
> So you need interrupt capability.  Current upstream ivshmem requires a
> server such as the one in contrib/ivshmem-server/.  What about yours?

IIRC, the need for a server with QEMU/KVM is related to live migration.
Jailhouse is simpler, all guests are managed by the same hypervisor
instance, and there is no migration. That makes interrupt delivery much
simpler as well. However, the device spec should not exclude other
architectures.

> 
> The interrupt feature enables me to guess a definition of "link": A and
> B are peers of the same link if they can interrupt each other.
> 
> Does your ivshmem2 support interrupt-less operation similar to
> ivshmem-plain?

Each receiver of interrupts is free to enable that - or leave it off as
it is the default after reset. But currently the spec demands that
either MSI-X or INTx is reported as available to the guests. We could
extend it to permit reporting no interrupts support if there is a good
case for it.

I will have to look into the details of the client-server structure of
QEMU's ivshmem again to answer the question under with restriction we
can make it both simpler and more robust. As Jailhouse has no live
migration support, requirements on ivshmem related to that may only be
addressed by chance so far.

> 
>> And then there are some extensions of the original ivshmem:
>>
>> - Multiple shared memory regions, including unidirectional ones
>>
>>   It is now possible to expose up to three different shared memory
>>   regions: The first one is read/writable for both sides. The second
>>   region is read/writable for the local peer and read-only for the
>>   remote peer (useful for output queues). And the third is read-only
>>   locally but read/writable remotely (ie. for input queues).
>>   Unidirectional regions prevent that the receiver of some data can
>>   interfere with the sender while it is still building the message, a
>>   property that is not only useful for safety critical communication,
>>   we are sure.
>>
>> - Life-cycle management via local and remote state
>>
>>   Each device can now signal its own state in form of a value to the
>>   remote side, which triggers an event there.
> 
> How are "events" related to interrupts?

Confusing term chosen here: an interrupt is triggered on the remote side
(if it has interrupts enabled).

> 
>>                                               Moreover, state changes
>>   done by the hypervisor to one peer are signalled to the other side.
>>   And we introduced a write-to-shared-memory mechanism for the
>>   respective remote state so that guests do not have to issue an MMIO
>>   access in order to check the state.
>>
>> So, this is our proposal. Would be great to hear some opinions if you
>> see value in adding support for such an "ivshmem 2.0" device to QEMU as
>> well and expand its ecosystem towards Linux upstream, maybe also DPDK
>> again. If you see problems in the new design /wrt what QEMU provides so
>> far with its ivshmem device, let's discuss how to resolve them. Looking
>> forward to any feedback!
> 
> My general opinion on ivshmem is well-known, but I repeat it for the
> record: merging it was a mistake, and using it is probably a mistake.  I
> detailed my concerns in "Why I advise against using ivshmem"[*].
> 
> My philosophical concerns remain.  Perhaps you can assuage them.
> 
> Only some of my practical concerns have since been addressed.  In part
> by myself, because having a flawed implementation of a bad idea is
> strictly worse than the same with flaws corrected as far as practical.
> But even today, docs/specs/ivshmem-spec.txt is a rather depressing read.

I agree.

> 
> However, there's one thing that's still worse than a more or less flawed
> implementation of a bad idea: two implementations of a bad idea.  Could
> ivshmem2 be done in a way that permits *replacing* ivshmem?

If people see the need for having a common ivshmem2, that should of
course be designed to replace the original version of QEMU. I wouldn't
like to design it being backward compatible, but the new version should
provide all useful and required features of the old one.

Of course, I'm careful with investing much time into expanding the
existing, for Jailhouse possibly sufficient design if there no real
interest in continuing the ivshmem support in QEMU - because of
vhost-pci or other reasons. But if that interest exists, it would be
beneficial for us to have QEMU supporting a compatible version and using
the same guest drivers. Then I would start looking into concrete patches
for it as well.

Jan

> 
>> [1] https://github.com/siemens/jailhouse
>> [2]
>> http://git.kiszka.org/?p=linux.git;a=blob;f=drivers/net/ivshmem-net.c;h=0e770ca293a4aca14a55ac0e66871b09c82647af;hb=refs/heads/queues/jailhouse
>> [3] https://github.com/siemens/jailhouse/commits/wip/ivshmem2
>> [4]
>> http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/jailhouse-ivshmem2
> 
> [*] http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg02968.html
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]