qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
Date: Fri, 27 Dec 2013 12:44:40 +1100
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

On 12/27/2013 02:12 AM, Michael S. Tsirkin wrote:
> On Fri, Dec 27, 2013 at 01:59:19AM +1100, Alexey Kardashevskiy wrote:
>> On 12/27/2013 12:48 AM, Michael S. Tsirkin wrote:
>>> On Thu, Dec 26, 2013 at 11:51:04PM +1100, Alexey Kardashevskiy wrote:
>>>> On 12/26/2013 09:49 PM, Michael S. Tsirkin wrote:
>>>>> On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote:
>>>>>> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>>>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 
>>>>>>>>>>>>>>>>> machine - it does
>>>>>>>>>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Steps to reproduce:
>>>>>>>>>>>>>>>>> 1. boot the guest
>>>>>>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not 
>>>>>>>>>>>>>>>>> work at all.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The test is:
>>>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows 
>>>>>>>>>>>>>>>>> no trafic
>>>>>>>>>>>>>>>>> coming from the guest. If to compare how it works before and 
>>>>>>>>>>>>>>>>> after reboot,
>>>>>>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and 
>>>>>>>>>>>>>>>>> receives the
>>>>>>>>>>>>>>>>> response and it does the same after reboot but the answer 
>>>>>>>>>>>>>>>>> does not come.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all 
>>>>>>>>>>>>>>> what is
>>>>>>>>>>>>>>> happening there.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One more hint - if I boot the guest and the guest does not 
>>>>>>>>>>>>>>> bring eth0 up
>>>>>>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), 
>>>>>>>>>>>>>>> then eth0 will
>>>>>>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>>>>>>>>>> sleep 210
>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to 
>>>>>>>>>>>>>>> reproduce.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No "vhost" == always works. The only difference I can see here 
>>>>>>>>>>>>>>> is vhost's
>>>>>>>>>>>>>>> thread which may get suspended if not used for a while after 
>>>>>>>>>>>>>>> the start and
>>>>>>>>>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yet another clue - this host kernel patch seems to help with the 
>>>>>>>>>>>>>> guest
>>>>>>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>>>>>>>>> index 69068e0..5e67650 100644
>>>>>>>>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>>>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev 
>>>>>>>>>>>>>> *dev, struct
>>>>>>>>>>>>>> vhost_work *work)
>>>>>>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>>>>>>>>>                 work->queue_seq++;
>>>>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>>>>> -               wake_up_process(dev->worker);
>>>>>>>>>>>>>>         } else {
>>>>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>> +       wake_up_process(dev->worker);
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Interesting. Some kind of race? A missing memory barrier 
>>>>>>>>>>>>> somewhere?
>>>>>>>>>>>>
>>>>>>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, 
>>>>>>>>>>>> nothing
>>>>>>>>>>>> happens to cause races.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Since it's all around startup,
>>>>>>>>>>>>> you can try kicking the host eventfd in
>>>>>>>>>>>>> vhost_net_start.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> How exactly? This did not help. Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>>>>>>>>> index 006576d..407ecf2 100644
>>>>>>>>>>>> --- a/hw/net/vhost_net.c
>>>>>>>>>>>> +++ b/hw/net/vhost_net.c
>>>>>>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, 
>>>>>>>>>>>> NetClientState
>>>>>>>>>>>> *ncs,
>>>>>>>>>>>>          if (r < 0) {
>>>>>>>>>>>>              goto err;
>>>>>>>>>>>>          }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>>>>>>>>>> +        struct vhost_vring_file file = {
>>>>>>>>>>>> +            .index = i
>>>>>>>>>>>> +        };
>>>>>>>>>>>> +        file.fd =
>>>>>>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>>>>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>>>>>>>>>
>>>>>>>>>>> No, this sets the notifier, it does not kick.
>>>>>>>>>>> To kick you write 1 there:
>>>>>>>>>>>     uint6_t  v = 1;
>>>>>>>>>>>     write(fd, &v, sizeof v);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please, be precise. How/where do I get that @fd? Is what I do 
>>>>>>>>>> correct?
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> What
>>>>>>>>>> is uint6_t - uint8_t or uint16_t (neither works)?
>>>>>>>>>
>>>>>>>>> Sorry, should have been uint64_t.
>>>>>>>>
>>>>>>>>
>>>>>>>> Oh, that I missed :-) Anyway, this does not make any difference. Is 
>>>>>>>> there
>>>>>>>> any cheap&dirty way to make vhost-net kernel thread always awake? 
>>>>>>>> Sending
>>>>>>>> it signals from the user space does not work...
>>>>>>>
>>>>>>> You can run a timer in qemu and signal the eventfd from there
>>>>>>> periodically.
>>>>>>>
>>>>>>> Just to restate, tcpdump in guest shows that guest sends arp packet,
>>>>>>> but tcpdump in host on tun device does not show any packets?
>>>>>>
>>>>>>
>>>>>> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
>>>>>> something is happening on the host's TAP - the guest sends ARP request, 
>>>>>> the
>>>>>> response is visible on the TAP interface but not in the guest.
>>>>>
>>>>> Okay. So problem is on host to guest path then.
>>>>> Things to try:
>>>>>
>>>>> 1. trace handle_rx [vhost_net]
>>>>> 2. trace tun_put_user [tun]
>>>>> 3. I suspect some host bug in one of the features.
>>>>> Let's try to disable some flags with device property:
>>>>> you can get the list by doing:
>>>>> ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
>>>>> Things I would try turning off is guest offloads (ones that start with 
>>>>> guest_)
>>>>> event_idx,any_layout,mq.
>>>>> Turn them all off, if it helps try to find the one that helped.
>>>>
>>>>
>>>> Heh. It still would be awesome to read basics about this vhost thing as I
>>>> am debugging blindly :)
>>>>
>>>> Regarding your suggestions.
>>>>
>>>> 1. I put "printk" in handle_rx and tun_put_user.
>>>
>>> Fine, though it's easier with ftrace  http://lwn.net/Articles/370423/
>>> look for function filtering.
>>>
>>>> handle_rx stopped being called after 2:40 from the guest start,
>>>> tun_put_user stopped after 0:20 from the guest start. Accuracy is 5 
>>>> seconds.
>>>> If I bring the guest's eth0 up while handle_rx is still printing, it works,
>>>> i.e. tun_put_user is called a lot. Once handle_rx stopped, nothing can
>>>> bring eth0 back to live.
>>>
>>> OK so what should happen is that handle rx is called
>>> when you bring eth0 up.
>>> Do you see this?
>>> The way it is supposed to work is this:
>>>
>>> vhost_net_enable_vq calls vhost_poll_start then
>>
>>
>> This and what follows it is called when QEMU is just booting (in response
>> to PCI enable? somewhere in the middle of PCI discovery process) and then
>> VHOST_NET_SET_BACKEND is not called ever again.
>>
> 
> What should happen is up/down in guest
> will call virtio_net_vhost_status in qemu
> and then vhost_net_start/vhost_net_stop is called
> accordingly.
> These call VHOST_NET_SET_BACKEND ioctls
> 
> you don't see this?


Nope. What I see is that vhost_net_start is only called on
VIRTIO_PCI_STATUS and never after that as PCI status does not change (does
not it?).

The log of QEMU + gdb with some breakpoints:
http://pastebin.com/CSN6iSn6

In this example, I did not wait ~240 seconds so it works but still does not
print what you say it should print :)

Here is what I run:
http://ozlabs.ru/gitweb/?p=qemu/.git;a=shortlog;h=refs/heads/vhostdbg

Thanks!

[ time to go to the ocean :) ]


>>
>>> this calls mask = file->f_op->poll(file, &poll->table)
>>> on the tun file.
>>> this calls tun_chr_poll.
>>> at this point there are packets queued on tun already
>>> so that returns POLLIN | POLLRDNORM;
>>> this calls vhost_poll_wakeup and that checks mask against
>>> the key.
>>> key is POLLIN so vhost_poll_queue is called.
>>> this in turn calls vhost_work_queue
>>> work list is either empty then we wake up worker
>>> or it's not empty  then worker is running out job anyway.
>>> this will then invoke handle_rx_net.
>>>
>>>
>>>> 2. This is exactly how I run QEMU now. I basically set "off" for every
>>>> on/off parameters. This did not change anything.
>>>>
>>>> ./qemu-system-ppc64 \
>>>>    -enable-kvm \
>>>>    -m 2048 \
>>>>    -L qemu-ppc64-bios/ \
>>>>    -machine pseries \
>>>>    -trace events=qemu_trace_events \
>>>>    -kernel vml312 \
>>>>    -append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \
>>>>    -nographic \
>>>>    -vga none \
>>>>    -nodefaults \
>>>>    -chardev stdio,id=id0,signal=off,mux=on \
>>>>    -device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
>>>>    -mon id=id2,chardev=id0,mode=readline \
>>>>    -netdev
>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>>>    -device
>>>> virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer,ioeventfd=off,\
>>>> indirect_desc=off,event_idx=off,any_layout=off,csum=off,guest_csum=off,\
>>>> gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,\
>>>> host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,\
>>>> status=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_rx_extra=off,\
>>>> ctrl_mac_addr=off,ctrl_guest_offloads=off,mq=off,multifunction=off,\
>>>> command_serr_enable=off \
>>>>    -netdev user,id=id5,hostfwd=tcp::5000-:22 \
>>>>    -device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01
>>>>
>>>
>>> Yes this looks like some kind of race.
>>
>>
>> -- 
>> Alexey


-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]