qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 0/3] virtio: proposal to optimize accesses to


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH v2 0/3] virtio: proposal to optimize accesses to VQs
Date: Wed, 16 Dec 2015 13:02:31 +0200

On Wed, Dec 16, 2015 at 11:39:46AM +0100, Vincenzo Maffione wrote:
> 2015-12-16 10:34 GMT+01:00 Paolo Bonzini <address@hidden>:
> >
> >
> > On 16/12/2015 10:28, Vincenzo Maffione wrote:
> >> Assuming my TX experiments with disconnected backend (and I disable
> >> CPU dynamic scaling of performance, etc.):
> >>   1) after patch 1 and 2, virtio bottleneck jumps from ~1Mpps to 1.910 
> >> Mpps.
> >>   2) after patch 1,2 and 3, virtio bottleneck jumps to 2.039 Mpps.
> >>
> >> So I see an improvement for patch 3, and I guess it's because we avoid
> >> an additional memory translation and related overhead. I believe that
> >> avoiding the memory translation is more beneficial than avoiding the
> >> variable-sized memcpy.
> >> I'm not surprised of that, because taking a brief look at what happens
> >> under the hood when you call an access_memory() function - it looks
> >> like a lot of operations.
> >
> > Great, thanks for confirming!
> >
> > Paolo
> 
> No problems.
> 
> I have some additional (orthogonal) curiosities:
> 
>   1) Assuming "hw/virtio/dataplane/vring.c" is what I think it is (VQ
> data structures directly accessible in the host virtual memory, with
> guest-phyisical-to-host-virtual mapping done statically at setup time)
> why isn't QEMU using this approach also for virtio-net? I see it is
> used by virtio-blk only.

Because on Linux, nothing would be gained as compared to using vhost-net
in kernel or vhost-user with dpdk.  virtio-net is there for non-Linux
hosts, keeping it simple is important to avoid e.g. security problems.
Same as serial, etc.

>   2) In any case (vring or not) QEMU dynamically maps data buffers
> from guest physical memory, for each descriptor to be processed: e1000
> uses pci_dma_read/pci_dma_write, virtio uses
> cpu_physical_memory_map()/cpu_physical_memory_unmap(), vring uses the
> more specialied vring_map()/vring_unmap(). All of these go through
> expensive lookups and related operations to do the address
> translation.
> Have you considered the possibility to cache the translation result to
> remove this bottleneck (maybe just for virtio devices)? Or is any
> consistency or migration-related problem that would create issues?
> Just to give an example of what I'm talking about:
> https://github.com/vmaffione/qemu/blob/master/hw/net/e1000.c#L349-L423.
> 
> At very high packet rates, once notifications (kicks and interrupts)
> have been amortized in some way, memory translation becomes the major
> bottleneck. And this (1 and 2) is why QEMU virtio implementation
> cannot achieve the same throughput as bhyve does (5-6 Mpps or more
> IIRC).
> 
> Cheers,
>   Vincenzo
> 
> 
> 
> -- 
> Vincenzo Maffione



reply via email to

[Prev in Thread] Current Thread [Next in Thread]