[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions
From: |
Benjamin Herrenschmidt |
Subject: |
Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions |
Date: |
Tue, 22 May 2012 17:17:39 +1000 |
On Tue, 2012-05-22 at 14:34 +1000, Benjamin Herrenschmidt wrote:
> The emulated devices can run simultaneously with the guest, so
> we need to be careful with ordering of load and stores done by
> them to the guest system memory, which need to be observed in
> the right order by the guest operating system.
>
> This adds a barrier call to the basic DMA read/write ops which
> is currently implemented as a smp_mb(), but could be later
> improved for more fine grained control of barriers.
>
> Additionally, a _relaxed() variant of the accessors is provided
> to easily convert devices who would be performance sensitive
> and negatively impacted by the change.
>
> Signed-off-by: Benjamin Herrenschmidt <address@hidden>
> ---
(Note to Rusty: The number I told you on ST is wrong, see below)
So I tried to do some performance measurements with that patch using
netperf on an x86 laptop (x220 with core i7).
It's a bit tricky. For example, if I just create a tap interface,
give it a local IP on the laptop and a different IP on the guest,
(ie talking to a netserver on the host basically from the guest
via tap), the performance is pretty poor and the numbers seem
useless with and without the barrier.
So I did tests involving talking to a server on our gigabit network
instead.
The baseline is the laptop without kvm talking to the server. The
TCP_STREAM test results are:
(The "*" at the beginning of the lines is something I added to
distinguish multi-line results on some tests)
MIGRATED TCP STREAM TEST
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
* 87380 16384 16384 10.02 933.02
* 87380 16384 16384 10.03 908.64
* 87380 16384 16384 10.03 926.78
* 87380 16384 16384 10.03 919.73
It's a bit noisy, ideally I should do a point-to-point setup to
an otherwise idle machine, here I'm getting some general lab network
noise but it gives us a pretty good baseline to begin with.
I have not managed to get any sensible result out of UDP_STREAM in
that configuration for some reason (ie just the host laptop), ie
they look insane :
MIGRATED UDP STREAM TEST
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
*229376 65507 10.00 270468 0 14173.84
126976 10.00 44 2.31
*229376 65507 10.00 266526 0 13967.32
126976 10.00 41 2.15
So we don't have a good comparison baseline but we can still compare
KVM against itself with and without the barrier.
Now KVM. This is x86_64 running an ubuntu precise guest (I had the
ISO laying around) and using the default setup which appears to be
an emulated e1000. I've done some tests with slirp just to see how
bad it was and it's bad enough to be irrelevant. The numbers have
thus been done using a tap interface bridged to the host ethernet
(who happens to also be some kind of e1000).
For each test I've done 3 series of numbers:
- Without the barrier added
- With the barrier added to dma_memory_rw
- With the barrier added to dma_memory_rw -and- dma_memory_map
First TCP_STREAM. The numbers are a bit noisy, I suspect somebody
was hammering the server machine while I was doing one of the tests,
but here's what I got when it appeared to have stabilized:
MIGRATED TCP STREAM TEST
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
no barrier
* 87380 16384 16384 10.01 880.31
* 87380 16384 16384 10.01 876.73
* 87380 16384 16384 10.01 880.73
* 87380 16384 16384 10.01 878.63
barrier
* 87380 16384 16384 10.01 869.39
* 87380 16384 16384 10.01 864.99
* 87380 16384 16384 10.01 886.13
* 87380 16384 16384 10.01 872.90
barrier + map
* 87380 16384 16384 10.01 867.45
* 87380 16384 16384 10.01 868.51
* 87380 16384 16384 10.01 888.94
* 87380 16384 16384 10.01 888.19
As far as I can tell, it's all in the noise. I was about to concede a
small (1 % ?) loss to the barrier until I ran the last 2 tests and
then I stopped caring :-)
With UDP_STREAM, we get something like that:
MIGRATED UDP STREAM TEST
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
no barrier
*229376 65507 10.00 5208 0 272.92
126976 10.00 5208 272.92
*229376 65507 10.00 5447 0 285.44
126976 10.00 5447 285.44
*229376 65507 10.00 5119 0 268.22
126976 10.00 5119 268.22
barrier
*229376 65507 10.00 5326 0 279.06
126976 10.00 5326 279.06
*229376 65507 10.00 5072 0 265.75
126976 10.00 5072 265.75
*229376 65507 10.00 5282 0 276.78
126976 10.00 5282 276.78
barrier + map
*229376 65507 10.00 5512 0 288.83
126976 10.00 5512 288.83
*229376 65507 10.00 5571 0 291.94
126976 10.00 5571 291.94
*229376 65507 10.00 5195 0 272.23
126976 10.00 5195 272.23
So I think here too we're in the noise. In fact, that makes me want to
stick the barrier in map() as well (though see my other email about
using a flag to implement "relaxed" to avoid an explosion of accessors).
Now, I suspect somebody needs to re-run those tests on HW that is known
to be more sensitive to memory barriers, it could be that my SB i7 in
64-bit mode is just the best case scenario and that some old core1 or 2
using a 32-bit lock instruction will suck a lot more.
In any case, it looks like the performance loss is minimal if measurable
at all, and in case there's a real concern on a given driver we can
always fix -that- driver to use more relaxed accessors.
Cheers,
Ben.
- Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function, (continued)
- Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function, Michael S. Tsirkin, 2012/05/22
- Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function, Benjamin Herrenschmidt, 2012/05/21
- Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function, Rusty Russell, 2012/05/22
- Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions, Anthony Liguori, 2012/05/21
- Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions, Benjamin Herrenschmidt, 2012/05/21
- Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions, Anthony Liguori, 2012/05/21
- Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions, Michael S. Tsirkin, 2012/05/21
- Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions, Benjamin Herrenschmidt, 2012/05/21
- [Qemu-devel] [PATCH] Add a memory barrier to DMA functions, Benjamin Herrenschmidt, 2012/05/22
- Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions, Benjamin Herrenschmidt, 2012/05/22
- Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions,
Benjamin Herrenschmidt <=
- Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions, Michael S. Tsirkin, 2012/05/22
- Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions, Benjamin Herrenschmidt, 2012/05/22
- Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions, Michael S. Tsirkin, 2012/05/22
- Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions, Benjamin Herrenschmidt, 2012/05/22
- Re: [Qemu-devel] [PATCH] Add a memory barrier to DMA functions, Anthony Liguori, 2012/05/22
- Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions, Michael S. Tsirkin, 2012/05/21
Re: [Qemu-devel] [PATCH 00/13] IOMMU infrastructure, Anthony Liguori, 2012/05/14