qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0
Date: Tue, 3 Apr 2012 14:34:32 +0100

On Tue, Apr 3, 2012 at 1:42 PM, Chris Webb <address@hidden> wrote:
> Stefan Hajnoczi <address@hidden> writes:
>
>> In a case like this it might be most effective to catch a VM in the
>> bad state and then go in with gdb to see what is broken.  The basic
>> approach would be putting breakpoints on the e1000 device model's
>> transmit/receive paths to see if the guest is giving us packets and
>> whether the tap device is transmitting/receiving.  If guest and host
>> appear to be working then QEMU's e1000 model must be in a bad state
>> and it's a question of looking at the tx/rx rings and other hardware
>> emulation state to figure out what went wrong.
>
> Hi Stefan. I tried setting a breakpoint on start_xmit, but the qemu blew up
> when I hit it:
>
> (gdb) break /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c:start_xmit
> Function "start_xmit" not defined.
> Make breakpoint pending on future shared library load? (y or [n]) n
> (gdb) break /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c:528
> Breakpoint 1 at 0x46dcd6: file 
> /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c, line 528.
> (gdb) cont
> Continuing.
>
> Program terminated with signal SIGTRAP, Trace/breakpoint trap.
> The program no longer exists.
>
> I assume this is some subtlety with breakpointing threaded code?

No, that's weird.  I would have simply tried "b start_xmit" and as
long as the binary has symbols gdb would know what to do.

> However, along these lines, I note that the guest appears to have received
> packets, though this count is stuck at 1993 bytes. The TX count marches 
> upwards
> as I ping outbound from the guest.
>
> If I attach a tcpdump to tap1 on the host, I see the ARP requests going out 
> and
> apparently no reply:
>
> 0024# tcpdump -i tap1
> tcpdump: WARNING: tap1: no IPv4 address assigned
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on tap1, link-type EN10MB (Ethernet), capture size 65535 bytes
> 12:08:35.654992 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
> 12:08:36.654976 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
> 12:08:37.654975 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
> 12:08:38.670933 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
> 12:08:39.670922 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
> 12:08:40.670908 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
>
> Looking on br0, I do seem to see the replies:
>
> 12:12:53.509471 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> 84.45.8.129 tell 84.45.8.242, length 28
> 12:12:53.509914 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 
> 00:13:c3:35:a6:42 (oui Unknown), length 46
> 12:12:54.509455 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> 84.45.8.129 tell 84.45.8.242, length 28
> 12:12:54.509875 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 
> 00:13:c3:35:a6:42 (oui Unknown), length 46
> 12:12:55.509447 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> 84.45.8.129 tell 84.45.8.242, length 28
> 12:12:55.509878 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 
> 00:13:c3:35:a6:42 (oui Unknown), length 46
> 12:12:56.525424 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> 84.45.8.129 tell 84.45.8.242, length 28
> 12:12:56.525854 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 
> 00:13:c3:35:a6:42 (oui Unknown), length 46
> 12:12:57.525408 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> 84.45.8.129 tell 84.45.8.242, length 28
> 12:12:57.525837 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 
> 00:13:c3:35:a6:42 (oui Unknown), length 46
>
> but they never get to tap1 despite STP being disabled and no bridge port
> filtering:
>
>  # ebtables -L
>  Bridge table: filter
>
>  Bridge chain: INPUT, entries: 0, policy: ACCEPT
>
>  Bridge chain: FORWARD, entries: 0, policy: ACCEPT
>
>  Bridge chain: OUTPUT, entries: 0, policy: ACCEPT
>
>  # brctl show br0
>  bridge name     bridge id               STP enabled     interfaces
>  br0             8000.002590224ffa       no              eth0
>
>
> This looks uncannily like a kernel problem doesn't it? However, remove the
> -usbdevice tablet, and it goes away, which is truly weird! I've just done a
> hundred successful reboots without it once again to confirm to myself that I'm
> definitely not imagining that behaviour.

Are you sure no other guest has the same MAC address or IP address?
This weird behavior sounds similar to what happens when you have
multiple devices on a network using the same address - the results are
very confusing :).

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]