qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
Date: Thu, 10 Jan 2013 09:44:45 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Jan 09, 2013 at 11:33:25PM +0800, Jason Wang wrote:
> On 01/09/2013 11:32 PM, Michael S. Tsirkin wrote:
> > On Wed, Jan 09, 2013 at 03:29:24PM +0100, Stefan Hajnoczi wrote:
> >> On Fri, Dec 28, 2012 at 06:31:52PM +0800, Jason Wang wrote:
> >>> Perf Numbers:
> >>>
> >>> Two Intel Xeon 5620 with direct connected intel 82599EB
> >>> Host/Guest kernel: David net tree
> >>> vhost enabled
> >>>
> >>> - lots of improvents of both latency and cpu utilization in 
> >>> request-reponse test
> >>> - get regression of guest sending small packets which because TCP tends 
> >>> to batch
> >>>   less when the latency were improved
> >>>
> >>> 1q/2q/4q
> >>> TCP_RR
> >>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> >>> 1 1     9393.26   595.64  9408.18   597.34  9375.19   584.12
> >>> 1 20    72162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> >>> 1 50    107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> >>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> >>> 64 1    9453.42   632.33  9371.37   616.13  9338.19   615.97
> >>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> >>> 64 50   106966    2448.29 146518.67 2514.47 242134.07 2720.91
> >>> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> >>> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> >>> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> >>> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> >>> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> >>> TCP_CRR
> >>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> >>> 1 1     2848.37  163.41 2230.39  130.89 2013.09  120.47
> >>> 1 20    23434.5  562.11 31057.43 531.07 49488.28 564.41
> >>> 1 50    28514.88 582.17 40494.23 605.92 60113.35 654.97
> >>> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> >>> 64 1    2780.08  159.4  2201.07  127.96 2006.8   117.63
> >>> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> >>> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> >>> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> >>> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> >>> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> >>> 256 50  28354.7  579.85 40578.31 607    60261.71 657.87
> >>> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> >>> TCP_STREAM guest receiving
> >>>  size #sessions throughput  norm throughput  norm throughput  norm
> >>> 1 1     16.27   1.33   16.1    1.12   16.13   0.99
> >>> 1 2     33.04   2.08   32.96   2.19   32.75   1.98
> >>> 1 4     66.62   6.83   68.3    5.56   66.14   2.65
> >>> 64 1    896.55  56.67  914.02  58.14  898.9   61.56
> >>> 64 2    1830.46 91.02  1812.02 64.59  1835.57 66.26
> >>> 64 4    3626.61 142.55 3636.25 100.64 3607.46 75.03
> >>> 256 1   2619.49 131.23 2543.19 129.03 2618.69 132.39
> >>> 256 2   5136.58 203.02 5163.31 141.11 5236.51 149.4
> >>> 256 4   7063.99 242.83 9365.4  208.49 9421.03 159.94
> >>> 512 1   3592.43 165.24 3603.12 167.19 3552.5  169.57
> >>> 512 2   7042.62 246.59 7068.46 180.87 7258.52 186.3
> >>> 512 4   6996.08 241.49 9298.34 206.12 9418.52 159.33
> >>> 1024 1  4339.54 192.95 4370.2  191.92 4211.72 192.49
> >>> 1024 2  7439.45 254.77 9403.99 215.24 9120.82 222.67
> >>> 1024 4  7953.86 272.11 9403.87 208.23 9366.98 159.49
> >>> 4096 1  7696.28 272.04 7611.41 270.38 7778.71 267.76
> >>> 4096 2  7530.35 261.1  8905.43 246.27 8990.18 267.57
> >>> 4096 4  7121.6  247.02 9411.75 206.71 9654.96 184.67
> >>> 16384 1 7795.73 268.54 7780.94 267.2  7634.26 260.73
> >>> 16384 2 7436.57 255.81 9381.86 220.85 9392    220.36
> >>> 16384 4 7199.07 247.81 9420.96 205.87 9373.69 159.57
> >>> TCP_MAERTS guest sending
> >>>  size #sessions throughput  norm throughput  norm throughput  norm
> >>> 1 1     15.94   0.62   15.55   0.61   15.13   0.59
> >>> 1 2     36.11   0.83   32.46   0.69   32.28   0.69
> >>> 1 4     71.59   1      68.91   0.94   61.52   0.77
> >>> 64 1    630.71  22.52  622.11  22.35  605.09  21.84
> >>> 64 2    1442.36 30.57  1292.15 25.82  1282.67 25.55
> >>> 64 4    3186.79 42.59  2844.96 36.03  2529.69 30.06
> >>> 256 1   1760.96 58.07  1738.44 57.43  1695.99 56.19
> >>> 256 2   4834.23 95.19  3524.85 64.21  3511.94 64.45
> >>> 256 4   9324.63 145.74 8956.49 116.39 6720.17 73.86
> >>> 512 1   2678.03 84.1   2630.68 82.93  2636.54 82.57
> >>> 512 2   9368.17 195.61 9408.82 204.53 5316.3  92.99
> >>> 512 4   9186.34 209.68 9358.72 183.82 9489.29 160.42
> >>> 1024 1  3620.71 109.88 3625.54 109.83 3606.61 112.35
> >>> 1024 2  9429    258.32 7082.79 120.55 7403.53 134.78
> >>> 1024 4  9430.66 290.44 9499.29 232.31 9414.6  190.92
> >>> 4096 1  9339.28 296.48 9374.23 372.88 9348.76 298.49
> >>> 4096 2  9410.53 378.69 9412.61 286.18 9409.75 278.31
> >>> 4096 4  9487.35 374.1  9556.91 288.81 9441.94 221.64
> >>> 16384 1 9380.43 403.8  9379.78 399.13 9382.42 393.55
> >>> 16384 2 9367.69 406.93 9415.04 312.68 9409.29 300.9
> >>> 16384 4 9391.96 405.17 9695.12 310.54 9423.76 223.47
> >> Trying to understand the performance results:
> >>
> >> What is the host device configuration?  tap + bridge?
> 
> Yes.
> >>
> >> Did you use host CPU affinity for the vhost threads?
> 
> I use numactl to pin cpu threads and vhost threads in the same numa node.
> >> Can multiqueue tap take advantage of multiqueue host NICs or is
> >> virtio-net multiqueue unaware of the physical NIC multiqueue
> >> capabilities?
> 
> Tap is unware of the physical multiqueue NIC, but we can benefit from it
> since we use multiple vhost threads.

I wonder if it makes a difference to bind tap queues to physical NIC
queues.  Maybe this is only possible in macvlan or can you preset the
queue index of outgoing skbs so the network stack doesn't recalculate
the flow?

> >>
> >> The results seem pretty mixed - as a user it's not obvious what to
> >> choose as a good all-round setting.
> > Yes, this I think is the reason it's disabled by default ATM,
> > guest admin has to enable it using ethtool.
> >
> > From what I saw, it looks like with a streaming guest to external
> > benchmark, we sometimes get smaller packets and
> > so worse performance. We are still investigating - what's
> > going on seems to be a strange interaction with guest TCP stack.
> 
> Yes, guest TCP tends to batch less when the multiqueue is enabled
> (latency is improved). So much more smaller packets were sent in this
> case leads to bad performance.

Okay, this makes sense.

> > Other workloads seem to benefit.
> >
> >>  Any observations on how multiqueue
> >> should be configured?
> > I think the right thing to do is to enable it on the host and
> > let guest admin enable it if appropriate.
> >
> >> What is the "norm" statistic?
> 
> Sorry for not being clear, it's short for normalized result (divide the
> result by cpu utilization).

Okay, that explains the results a little.  When norm doesn't change much
across 1q/2q/4q we're getting linear scaling.  It scales further because
the queues allow for more CPUs to be used.  That's good.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]