qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] TCP Segementation Offloading


From: Ingo Krabbe
Subject: Re: [Qemu-devel] TCP Segementation Offloading
Date: Fri, 6 May 2016 06:34:33 +0200

> On Sun, May 01, 2016 at 02:31:57PM +0200, Ingo Krabbe wrote:
>> Good Mayday Qemu Developers,
>> 
>> today I tried to find a reference to a networking problem, that seems to be 
>> of quite general nature: TCP Segmentation Offloading (TSO) in virtual 
>> environments.
>> 
>> When I setup TAP network adapter for a virtual machine and put it into a 
>> host bridge, the known best practice is to manually set "tso off gso off" 
>> with ethtool, for the guest driver if I use a hardware emulation, such as 
>> e1000 and/or "tso off gso off" for the host driver and/or for the bridge 
>> adapter, if I use the virtio driver, as otherwise you experience 
>> (sometimes?) performance problems or even lost packages.
> 
> I can't parse this sentence.  In what cases do you think it's a "known
> best practice" to disable tso and gso?  Maybe a table would be a clearer
> way to communicate this.
> 
> Can you provide a link to the source claiming tso and gso should be
> disabled?

Sorry for that long sentence. The consequence seems to be, that it is most 
stable to turn off tso and gso for host bridges and for adapters in virtual 
machines.

One of the most comprehensive collections of arguments is this article

        
https://kris.io/2015/10/01/kvm-network-performance-tso-and-gso-turn-it-off/

while I also found a documentation for Centos 6

        
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/ch10s04.html

In google groups this one is discussed

        https://code.google.com/p/ganeti/wiki/PerformanceTuning

Of course the same is found for Xen Machines

        http://cloudnull.io/2012/07/xenserver-network-tuning/

You see there are several Links in the internet and my first question is: Why 
can't I find this discussion in the qemu-wiki space.

I think the bug

        https://bugs.launchpad.net/bugs/1202289

is related.

>> I haven't found a complete analysis of the background of these problems, but 
>> there seem to be some effects on MTU based fragmentation and UDP checksums.
>> 
>> There is a tso related bug on launchpad, but the context of this bug is too 
>> narrow, for the generality of the problem.
>> 
>> Also it seems that there is a problem in LXC contexts too (I found such a 
>> reference, without detailed description in a Post about Xen setup).
>> 
>> My question now is: Is there a bug in the driver code and shouldn't this be 
>> documented somewhere in wiki.qemu.org? Where there developments about this 
>> topic in the past or is there any planned/ongoing work todo on the qemu 
>> drivers?
>> 
>> Most problem reports found relate to deprecated Centos6 qemu-kvm packages.
>> 
>> In our company we have similar or even worse problems with Centos7 hosts and 
>> guest machines.
> 
> Have haven't explained what problem you are experiencing.  If you want
> help with your setup please include your QEMU command-line (ps aux |
> grep qemu), the traffic pattern (ideally how to reproduce it with a
> benchmarking tool), and what observation you are making (e.g. netstat
> counters showing dropped packets).

I was quite astonished about the many hints about virtio drivers as we had this 
problem with the e1000 driver in a Centos7 Guest on a Centos6 Host.

        e1000 0000:00:03.0 ens3: Detected Tx Unit Hang#012  Tx Queue            
 <0>#012  TDH                  <42>#012  TDT                  <42>#012  
next_to_use          <2e>#012  next_to_clean        
<42>#012buffer_info[next_to_clean]#012  time_stamp           <104aff1b8>#012  
next_to_watch        <44>#012  jiffies              <104b00ee9>#012  
next_to_watch.status <0>
        Apr 25 21:08:48 db03 kernel: ------------[ cut here ]------------
        Apr 25 21:08:48 db03 kernel: WARNING: at net/sched/sch_generic.c:297 
dev_watchdog+0x270/0x280()
        Apr 25 21:08:48 db03 kernel: NETDEV WATCHDOG: ens3 (e1000): transmit 
queue 0 timed out
        Apr 25 21:08:48 db03 kernel: Modules linked in: binfmt_misc ipt_REJECT 
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip6t_REJECT nf_conntrack_ipv6 
nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables btrfs 
zlib_deflate raid6_pq xor ext4 mbcache jbd2 crc32_pclmul ghash_clmulni_intel 
aesni_intel lrw gf128mul glue_helper ablk_helper i2c_piix4 ppdev cryptd pcspkr 
virtio_balloon parport_pc parport sg nfsd auth_rpcgss nfs_acl lockd grace 
sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic 
pata_acpi virtio_scsi cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper 
ttm drm crct10dif_pclmul crct10dif_common ata_piix crc32c_intel virtio_pci 
e1000 i2c_core virtio_ring libata serio_raw virtio floppy dm_mirror 
dm_region_hash dm_log dm_mod
        Apr 25 21:08:48 db03 kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 
3.10.0-327.13.1.el7.x86_64 #1
        Apr 25 21:08:48 db03 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 
01/01/2007
        Apr 25 21:08:48 db03 kernel: ffff88126f483d88 685d892e8a452abb 
ffff88126f483d40 ffffffff8163571c
        Apr 25 21:08:48 db03 kernel: ffff88126f483d78 ffffffff8107b200 
0000000000000000 ffff881203b9a000
        Apr 25 21:08:48 db03 kernel: ffff881201c3e080 0000000000000001 
0000000000000002 ffff88126f483de0
        Apr 25 21:08:48 db03 kernel: Call Trace:
        Apr 25 21:08:48 db03 kernel: <IRQ>  [<ffffffff8163571c>] 
dump_stack+0x19/0x1b
        Apr 25 21:08:48 db03 kernel: [<ffffffff8107b200>] 
warn_slowpath_common+0x70/0xb0
        Apr 25 21:08:48 db03 kernel: [<ffffffff8107b29c>] 
warn_slowpath_fmt+0x5c/0x80
        Apr 25 21:08:48 db03 kernel: [<ffffffff8154cd40>] 
dev_watchdog+0x270/0x280
        Apr 25 21:08:48 db03 kernel: [<ffffffff8154cad0>] ? 
dev_graft_qdisc+0x80/0x80
        Apr 25 21:08:48 db03 kernel: [<ffffffff8108b0a6>] 
call_timer_fn+0x36/0x110
        Apr 25 21:08:48 db03 kernel: [<ffffffff8154cad0>] ? 
dev_graft_qdisc+0x80/0x80
        Apr 25 21:08:48 db03 kernel: [<ffffffff8108dd97>] 
run_timer_softirq+0x237/0x340
        Apr 25 21:08:48 db03 kernel: [<ffffffff81084b0f>] 
__do_softirq+0xef/0x280
        Apr 25 21:08:48 db03 kernel: [<ffffffff816477dc>] call_softirq+0x1c/0x30
        Apr 25 21:08:48 db03 kernel: [<ffffffff81016fc5>] do_softirq+0x65/0xa0
        Apr 25 21:08:48 db03 kernel: [<ffffffff81084ea5>] irq_exit+0x115/0x120
        Apr 25 21:08:48 db03 kernel: [<ffffffff81648455>] 
smp_apic_timer_interrupt+0x45/0x60
        Apr 25 21:08:48 db03 kernel: [<ffffffff81646b1d>] 
apic_timer_interrupt+0x6d/0x80
        Apr 25 21:08:48 db03 kernel: <EOI>  [<ffffffff81058e96>] ? 
native_safe_halt+0x6/0x10
        Apr 25 21:08:48 db03 kernel: [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
        Apr 25 21:08:48 db03 kernel: [<ffffffff8101e4d6>] 
arch_cpu_idle+0x26/0x30
        Apr 25 21:08:48 db03 kernel: [<ffffffff810d6325>] 
cpu_startup_entry+0x245/0x290
        Apr 25 21:08:48 db03 kernel: [<ffffffff810475fa>] 
start_secondary+0x1ba/0x230
        Apr 25 21:08:48 db03 kernel: ---[ end trace 71ac4360272e207e ]---
        Apr 25 21:08:48 db03 kernel: e1000 0000:00:03.0 ens3: Reset adapter


I'm still not sure why this happens on this host "db03", while db02 and db01 
are not affected. All guests are running on different hosts and the network is 
controlled by an openvswitch.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]