qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [network performance question] only~2Gbpsthroughputbetw


From: Zhang Haoyu
Subject: Re: [Qemu-devel] [network performance question] only~2Gbpsthroughputbetweentwo linux guests which are running on the same hostvianetperf-tTCP_STREAM -m 1400, but xen can ac
Date: Tue, 10 Jun 2014 11:50:38 +0800

I run ethtool -k for backend tap netdevice, find that its tso is off,
Features for tap0:
rx-checksumming: off [fixed]
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: on
tcp-segmentation-offload: off
        tx-tcp-segmentation: off [requested on]
        tx-tcp-ecn-segmentation: off [requested on]
        tx-tcp6-segmentation: off [requested on]
udp-fragmentation-offload: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]

but, I failed to enable its tso, "Could not change any device features" error 
message was reported, why?

>>I see that RX checksumming is still off for you on virtio, this is
>>likely what's contribution to the problem.
>>
>>Here's how it looks for me:
>>ethtool -k eth1
>>      Offload parameters for eth1:
>>      rx-checksumming: on
>>      tx-checksumming: on
>>      scatter-gather: on
>>      tcp-segmentation-offload: on
>>      udp-fragmentation-offload: on
>>      generic-segmentation-offload: on
>>      generic-receive-offload: on
>>      large-receive-offload: off
>>
>When I select centos-6.3 as guest os, the rx-checksuming is on, too.
>After update the qemu from 1.4.0 to 2.0.0, the inter-vm throughput can achieve 
>~5Gbps via netper -t TCP_STREAM -m 1400.
>Here is ethtool -k eth1 on centos-6.3 guest,
>ethtool -k eth1
>Offload parameters for eth1:
>rx-checksumming: on
>tx-checksumming: on
>scatter-gather: on
>tcp-segmentation-offload: on
>udp-fragmentation-offload: on
>generic-segmentation-offload: on
>generic-receive-offload: off
>large-receive-offload: off
>
>the only difference is gro, on for you, off for me,
>I run 'ethtool -K eth1 gro on' on my guest, below error reported,
>"Cannot set device GRO settings: Invalid argument"
>
>>you don't supply kernel versions for host or guest kernels,
>>so it's hard to judge what's going on exactly.
>>
>host: linux-3.10.27(directly download from kernel.org)
>qemu: qemu-2.0.0(directly download from wiki.qemu.org/Download)
>guest: centos-6.3(2.6.32-279.e16.x86_64), 2vcpu
>
>>Bridge configuration also plays a huge role.
>>Things like ebtables might affect performance as well,
>>sometimes even if they are only loaded, not even enabled.
>>
>I will check it.
>
>>Also, some old scheduler versions didn't put VMs on different
>>CPUs aggressively enough, this resulted in conflicts
>>when VMs compete for the same CPU.
>I will check it.
>
No aggressively contention for the same CPU, but when I pin each vcpu to 
different pcpu, ~1Gbps bonus was gained.

>>On numa systems, some older host kernels would split VM memory
>>across NUMA nodes, this might lead to bad performance.
>>
>local first.
>
>>On Sat, Jun 07, 2014 at 11:07:10AM +0800, Zhang Haoyu wrote:
>>> After updating the qemu from 1.4 to 2.0, the inter-vm throughput can 
>>> achieve ~5Gbps via netper -t TCP_STREAM -m 1400,
>>> the performance gap(~2Gbps) between kvm and xen still exist.
>>> 
>>> Thanks,
>>> Zhang Haoyu
>>> 
>>> ------------------                           
>>> Zhang Haoyu
>>> 2014-06-07
>>> 
>>> -----Original Message-----
>>> From: Zhang Haoyu
>>> Sent: 2014-06-07 09:27:16
>>> To: Venkateswara Rao Nandigam; kvm; qemu-devel
>>> Cc: Gleb Natapov; Paolo Bonzini; Michael S.Tsirkin; yewudi
>>> Subject: Re: [network performance question] only ~2Gbpsthroughputbetweentwo 
>>> linux guests which are running on the same host vianetperf-tTCP_STREAM -m 
>>> 1400, but xen can ac
>>> 
>>> > Doesn't that answer your original question about performance gap!
> > Sorry, do you mean it's the offloadings cause the performance gap?
>>> But even OFF the checksum-offload, tso, gro, .etc, the performance gap 
>>> still exist,
>>> if I understand correctly, kvm should have better performance than xen from 
>>> the angle of implementation, because of shorter path, and fewer 
>>> context-switches,
>>> especially inter-vm communication.
>>> 
>>> And, why the performance gap is so big(~2G vs ~7G) when checksum-offload, 
>>> tso, gro, .etc is on for both hypervisors?
>>> Why the packes' size can be so big(65160) and stable on xen, but most 
>>> packets' size is 1448, only a few part is ~65000 on kvm, when netperf -t 
>>> TCP_STREAM -m 1400 ?
>>> Does some TCP configurations have buissness with this? Or some virtio-net 
>>> configurations?
>>> 
>>> Thanks,
>>> Zhang Haoyu
>>> 
>>> -----Original Message-----
>>> From: address@hidden [mailto:address@hidden On Behalf Of Zhang Haoyu
>>> Sent: Friday, June 06, 2014 3:44 PM
>>> To: Venkateswara Rao Nandigam; kvm; qemu-devel
>>> Cc: Gleb Natapov; Paolo Bonzini; Michael S.Tsirkin; yewudi
>>> Subject: Re: RE: [network performance question] only ~2Gbps 
>>> throughputbetweentwo linux guests which are running on the same host via 
>>> netperf-tTCP_STREAM -m 1400, but xen can ac
>>> 
>>> > >> Try Rx/Tx checksum offload on the all the concerned guests of  both 
>>> > >> Hypervisors.
>>> > >> 
>>> > >Already ON on both hypervisors, so some other offloadings(e.g. tso, gso) 
>>> > >can be supported.
>>> > 
>>> > Try Rx/Tx checksum offload "OFF" on the all the concerned guests of  
>>> > both Hypervisors
>>> > 
>>> Off  Rx/Tx checksum offload on XEN guest, 1.6Gbps achived, tcpdump result 
>>> on backend vif netdeivce shown that packets' size is 1448, stable.
>>> Off Rx/Tx checksum offload on KVM guest, only ~1Gbps ahchived, tcpdump 
>>> result on backend tap netdevice shown that packets' size is 1448, stable.
>>> 
>>> > And While launching the VM in KVM, in command line of virtio interface, 
>>> > you can specify TSO, LRO, RxMergebuf. Try this instead of ethtool 
>>> > interface.
>>> The cuurent qemu command shown as below, and I will change the virtio-net 
>>> configuration later as your advise, /usr/bin/kvm -id 8572667846472 -chardev 
>>> socket,id=qmp,path=/var/run/qemu-server/8572667846472.qmp,server,nowait 
>>> -mon chardev=qmp,mode=control -vnc :0,websocket,to=200,x509,password 
>>> -pidfile /var/run/qemu-server/8572667846472.pid -daemonize -name 
>>> centos6-196.5.5.72 -smp sockets=1,cores=2 -cpu core2duo -nodefaults -vga 
>>> cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive 
>>> file=/sf/data/local/iso/vmtools/virtio_auto_install.iso,if=none,id=drive-ide0,media=cdrom,aio=threads,forecast=disable
>>>  -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 
>>> -drive 
>>> file=/sf/data/local/images/host-f8bc123b3e74/32f49b646d1e/centos6-196.5.5.72.vm/vm-disk-1.qcow2,if=none,id=drive-ide2,cache=directsync,aio=threads,forecast=disable
>>>  -device ide-hd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100 
>>> -netdev type=tap,id=net0,ifname=857266784647200,s
 c
> r
>> ip
>>>  t=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device 
>>> virtio-net-pci,mac=FE:FC:FE:95:EC:A7,netdev=net0,bus=p
>>>  ci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global 
>>> kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global 
>>> PIIX4_PM.disable_s4=1
>>>  
>>> -----Original Message-----
>>> From: Zhang Haoyu [mailto:address@hidden
>>> Sent: Friday, June 06, 2014 1:26 PM
>>> To: Venkateswara Rao Nandigam; kvm; qemu-devel
>>> Cc: Gleb Natapov; Paolo Bonzini; Michael S.Tsirkin; yewudi
>>> Subject: RE: [network performance question] only ~2Gbps throughput 
>>> betweentwo linux guests which are running on the same host via netperf 
>>> -tTCP_STREAM -m 1400, but xen can ac
>>> 
>>> Thanks for reply.
>>> > >>> And, vhost enabled, tx zero-copy enabled, virtio TSO enabled on kvm.
>>> > 
>>> > Try lro "ON" on client side. This would require mergable Rx buffers to be 
>>> > ON.
>>> > 
>>> current setttings for gro and lro,
>>> generic-receive-offload: on
>>> large-receive-offload: off [fixed]
>>> 
>>> > And  Xen netfront to KVM virtio are not apples to apples because of their 
>>> > implementation details. 
>>> > 
>>> You are right, I just want to make network performance comparison between 
>>> the two virtualization platform from the view of user.
>>> 
>>> > Try Rx/Tx checksum offload on the all the concerned guests of  both 
>>> > Hypervisors.
>>> > 
>>> Already ON on both hypervisors, so some other offloadings(e.g. tso, gso) 
>>> can be supported.
>>> 
>>> kvm virtio-net nic:
>>> ethtool -k eth0
>>> Features for eth0:
>>> rx-checksumming: off [fixed]
>>> tx-checksumming: on
>>>     tx-checksum-ipv4: off [fixed]\
>>>     tx-checksum-ip-generic: on
>>>     tx-checksum-ipv6: off [fixed]
>>>     tx-checksum-fcoe-crc: off [fixed]
>>>     tx-checksum-sctp: off [fixed]
>>> scatter-gather: on
>>>     tx-scatter-gather: on
>>>     scatter-gather-fraglist: on
>>> tcp-segmentation-offload: on
>>>     tx-tcp-segmentation: on
>>>     tx-tcp-ecn-segmentation: on
>>>     tx-tcp6-segmentation: on
>>> udp-fragmentation-offload: on
>>> generic-segmentation-offload: on
>>> generic-receive-offload: on
>>> large-receive-offload: off [fixed]
>>> rx-vlan-offload: off [fixed]
>>> tx-vlan-offload: off [fixed]
>>> ntuple-filters: off [fixed]
>>> receive-hashing: off [fixed]
>>> highdma: on [fixed]
>>> rx-vlan-filter: on [fixed]
>>> vlan-challenged: off [fixed]
>>> tx-lockless: off [fixed]
>>> netns-local: off [fixed]
>>> tx-gso-robust: off [fixed]
>>> tx-fcoe-segmentation: off [fixed]
>>> tx-gre-segmentation: off [fixed]
>>> tx-udp_tnl-segmentation: off [fixed]
>>> fcoe-mtu: off [fixed]
>>> tx-nocache-copy: on
>>> loopback: off [fixed]
>>> rx-fcs: off [fixed]
>>> rx-all: off [fixed]
>>> tx-vlan-stag-hw-insert: off [fixed]
>>> rx-vlan-stag-hw-parse: off [fixed]
>>> rx-vlan-stag-filter: off [fixed]
>>> 
>>> xen netfront nic:
>>> ethtool -k eth0
>>> Offload features for eth0:
>>> rx-checksumming: on
>>> tx-checksumming: on
>>> scatter-gather: on
>>> tcp-segmentation-offload: on
>>> udp-fragmentation-offload: off
>>> generic-segmentation-offload: on
>>> generic-receive-offload: off
>>> large-receive-offload: off
>>> 
>>> <piece of tcpdump result on xen backend vif netdevice >
>>> 15:46:41.279954 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq 
>>> 1193138968:1193204128, ack 1, win 115, options [nop,nop,TS val 102307210 
>>> ecr 102291188], length 65160
>>> 15:46:41.279971 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq 
>>> 1193204128:1193269288, ack 1, win 115, options [nop,nop,TS val 102307210 
>>> ecr 102291188], length 65160
>>> 15:46:41.279987 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq 
>>> 1193269288:1193334448, ack 1, win 115, options [nop,nop,TS val 102307210 
>>> ecr 102291188], length 65160
>>> 15:46:41.280003 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq 
>>> 1193334448:1193399608, ack 1, win 115, options [nop,nop,TS val 102307210 
>>> ecr 102291188], length 65160
>>> 15:46:41.280020 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq 
>>> 1193399608:1193464768, ack 1, win 115, options [nop,nop,TS val 102307210 
>>> ecr 102291188], length 65160
>>> 15:46:41.280213 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq 
>>> 1193464768:1193529928, ack 1, win 115, options [nop,nop,TS val 102307211 
>>> ecr 102291189], length 65160
>>> 15:46:41.280233 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq 
>>> 1193529928:1193595088, ack 1, win 115, options [nop,nop,TS val 102307211 
>>> ecr 102291189], length 65160
>>> 15:46:41.280250 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq 
>>> 1193595088:1193660248, ack 1, win 115, options [nop,nop,TS val 102307211 
>>> ecr 102291189], length 65160
>>> 15:46:41.280239 IP 196.6.6.71.53622 > 196.6.6.72.53507: Flags [.], ack 
>>> 1193138968, win 22399, options [nop,nop,TS val 102291190 ecr 102307210], 
>>> length 0
>>> 15:46:41.280267 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq 
>>> 1193660248:1193725408, ack 1, win 115, options [nop,nop,TS val 102307211 
>>> ecr 102291189], length 65160
>>> 15:46:41.280284 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq 
>>> 1193725408:1193790568, ack 1, win 115, options [nop,nop,TS val 102307211 
>>> ecr 102291189], length 65160
>>> 
>>> Packets' size is very stable, 65160 Bytes.
>>> 
>>> Thanks,
>>> Zhang Haoyu
>>> 
>>> -----Original Message-----
>>> From: address@hidden [mailto:address@hidden On Behalf Of Zhang Haoyu
>>> Sent: Friday, June 06, 2014 9:01 AM
>>> To: kvm; qemu-devel
>>> Cc: Gleb Natapov; Paolo Bonzini; Michael S.Tsirkin; yewudi
>>> Subject: [network performance question] only ~2Gbps throughput between two 
>>> linux guests which are running on the same host via netperf -t TCP_STREAM 
>>> -m 1400, but xen can achieve ~7Gbps
>>> 
>>> Hi, all
>>> 
>>> I ran two linux guest on the same kvm host, then start the netserver on one 
>>> vm, start netperf on the other one,  netperf command and test result shown 
>>> as below, netperf -H 196.5.5.71 -t TCP_STREAM -l 60 -- -m 1400 -M 1400 
>>> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
>>> 196.5.5.71 () port 0 AF_INET : nodelay
>>> Recv   Send    Send                          
>>> Socket Socket  Message  Elapsed              
>>> Size   Size    Size     Time     Throughput  
>>> bytes  bytes   bytes    secs.    10^6bits/sec  
>>> 
>>>  87380  16384   1400    60.01    2355.45   
>>> 
>>> but I ran two linux guest on the same xen hypervisor, ~7Gbps throughput 
>>> achived, netperf command and test result shown as below, netperf -H 
>>> 196.5.5.71 -t TCP_STREAM -l 60 -- -m 1400 -M 1400 MIGRATED TCP STREAM TEST 
>>> from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 196.5.5.71 () port 0 AF_INET
>>> Recv   Send    Send                          
>>> Socket Socket  Message  Elapsed              
>>> Size   Size    Size     Time     Throughput  
>>> bytes  bytes   bytes    secs.    10^6bits/sec  
>>> 
>>>  87380  16384   1400    60.01    2349.82
>>> 
>>> many times test performed, the result is similar as above.
>>> 
>>> When I tcpdump backend tap netdevice, found that most packets' size is 
>>> 1448bytes on kvm, and few packets are ~60000Bytes, but  I tcpdump backend 
>>> vif netdevice, found that  most packets' size is >60000bytes on xen.
>>> Test result of netperf -t TCP_STREAM -m 64 is similar, more larger packets 
>>> on xen than kvm. 
>>> 
>>> And, vhost enabled, tx zero-copy enabled, virtio TSO enabled on kvm.
>>> 
>>> Any ideas?
>>> 
>>> Thanks,
>>> Zhang Haoyu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]