qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [Bug 1670377] Re: VNC: short read for zlre data/RDR EndOfSt


From: Daniel Berrange
Subject: [Qemu-devel] [Bug 1670377] Re: VNC: short read for zlre data/RDR EndOfStream
Date: Thu, 27 Apr 2017 14:12:23 -0000

It isn't 100% clear from the info provided, but this is almost certainly
fixed in 2.9.0 by

commit 537848ee62195fc06c328b1cd64f4218f404a7f1
Author: Michael Tokarev <address@hidden>
Date:   Fri Feb 3 12:52:29 2017 +0300

    vnc: do not disconnect on EAGAIN
    
    When qemu vnc server is trying to send large update to clients,
    there might be a situation when system responds with something
    like EAGAIN, indicating that there's no system memory to send
    that much data (depending on the network speed, client and server
    and what is happening).  In this case, something like this happens
    on qemu side (from strace):
    
    sendmsg(16, {msg_name(0)=NULL,
            msg_iov(1)=[{"\244\"..., 729186}],
            msg_controllen=0, msg_flags=0}, 0) = 103950
    sendmsg(16, {msg_name(0)=NULL,
            msg_iov(1)=[{"lz\346"..., 1559618}],
            msg_controllen=0, msg_flags=0}, 0) = -1 EAGAIN
    sendmsg(-1, {msg_name(0)=NULL,
            msg_iov(1)=[{"lz\346"..., 1559618}],
            msg_controllen=0, msg_flags=0}, 0) = -1 EBADF
    
    qemu closes the socket before the retry, and obviously it gets EBADF
    when trying to send to -1.
    
    This is because there WAS a special handling for EAGAIN, but now it doesn't
    work anymore, after commit 04d2529da27db512dcbd5e99d0e26d333f16efcc, because
    now in all error-like cases we initiate vnc disconnect.
    
    This change were introduced in qemu 2.6, and caused numerous grief for many
    people, resulting in their vnc clients reporting sporadic random disconnects
    from vnc server.
    
    Fix that by doing the disconnect only when necessary, i.e. omitting this
    very case of EAGAIN.
    
    Hopefully the existing condition (comparing with QIO_CHANNEL_ERR_BLOCK)
    is sufficient, as the original code (before the above commit) were
    checking for other errno values too.
    
    Apparently there's another (semi?)bug exist somewhere here, since the
    code tries to write to fd# -1, it probably should check if the connection
    is open before. But this isn't important.
    
    Signed-off-by: Michael Tokarev <address@hidden>
    Reviewed-by: Daniel P. Berrange <address@hidden>
    Message-id: address@hidden
    Fixes: 04d2529da27db512dcbd5e99d0e26d333f16efcc
    Cc: Daniel P. Berrange <address@hidden>
    Cc: Gerd Hoffmann <address@hidden>
    Cc: address@hidden
    Signed-off-by: Gerd Hoffmann <address@hidden>


** Changed in: qemu
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1670377

Title:
   VNC: short read for zlre data/RDR EndOfStream

Status in QEMU:
  Fix Released

Bug description:
  In openQA we have a custom VNC client (https://github.com/os-autoinst
  /os-autoinst/tree/master/consoles), which connects to QEMU guest and
  from there performs actions (sends keys, handles pointer, ...). We
  have several backends (https://github.com/os-autoinst/os-
  autoinst/tree/master/backend). With qemu backend we start QEMU guest
  *locally* on openQA worker which connects to it via VNC and sends
  commands. That works fine.

  However, with svirt backend we start QEMU on a KVM or Xen host and
  then connect to it remotely from openQA worker - the guest and worker
  are different systems. In this scenario fairly often happens that
  while system operates in Grub2, QEMU stops sending data via VNC:

  ...
  15:24:15.5341 Debug: 
/var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:50 
called testapi::send_key
  15:24:15.5342 27074 <<< testapi::send_key(key='c')
  15:24:15.7361 Debug: 
/var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:51 
called testapi::type_string
  15:24:15.7362 27074 <<< testapi::type_string(string='gfxmode=1024x768; 
terminal_output console; terminal_output gfxterm
  ', max_interval=250, wait_screen_changes=0)
  15:24:22.2243 Debug: 
/var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:53 
called testapi::send_key
  15:24:22.2244 27074 <<< testapi::send_key(key='esc')
  15:24:22.4255 Debug: 
/var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:79 
called testapi::send_key
  15:24:22.4256 27074 <<< testapi::send_key(key='e')
  15:24:22.6264 Debug: 
/var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:81 
called testapi::send_key
  15:24:22.6265 27074 <<< testapi::send_key(key='down')
  15:24:22.8273 Debug: 
/var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:81 
called testapi::send_key
  15:24:22.8274 27074 <<< testapi::send_key(key='down')
  15:24:23.0282 Debug: 
/var/lib/openqa/share/tests/sle-12-SP1/tests/installation/bootloader_uefi.pm:81 
called testapi::send_key
  15:24:23.0283 27074 <<< testapi::send_key(key='down')
  DIE short read for zlre data 107132 - 995002 at 
/usr/lib/os-autoinst/consoles/VNC.pm line 978.

   at /usr/lib/os-autoinst/backend/baseclass.pm line 73.
  ...

  My observation is that it happens only while in Grub, when resolution
  happened a short while ago. See attached video and log.

  Prior to QEMU 2.8.0 I was able to reproduce a similar issue with
  vncviewer. I started QEMU with SLES JeOS image pressed several times a
  'down' key in Grub and vncviewer (Tiger VNC 1.6.0 from openSUSE Leap
  42.2) crashed with rdr::EndOfStream exception. This does not happen
  with QEMU 2.8.0, but I am still able to reproduce similar issue via
  openQA.

  /usr/bin/qemu-system-x86_64 -name guest=openQA-SUT-20,debug-threads=on
  -S -machine pc-i440fx-2.6,accel=kvm,usb=off -m 1024 -realtime
  mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 87535fc1-e693-41b9
  -813e-834d6fc4cb5a -no-user-config -nodefaults   -rtc base=utc -no-
  reboot -boot strict=on -device piix3-usb-
  uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images
  /openQA-SUT-20.img,format=qcow2,if=none,id=drive-virtio-
  disk0,cache=unsafe -device virtio-blk-
  pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-
  disk0,bootindex=1 -netdev user,id=hostnet0 -device virtio-net-
  pci,netdev=hostnet0,id=net0,mac=52:54:00:12:34:56,bus=pci.0,addr=0x3
  -chardev pty,id=charserial0 -device isa-
  serial,chardev=charserial0,id=serial0 -device virtio-tablet-
  pci,id=input0,bus=pci.0,addr=0x6 -device virtio-keyboard-
  pci,id=input1,bus=pci.0,addr=0x7 -vnc 0.0.0.0:20,share=force-shared
  -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-
  balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on -monitor
  stdio

  Host: openSUSE Leap 42.2 x86_64 KVM or Xen on x86_64 Intel with QEMU 2.6.0.
  Guest: Leap 42.2.

  I can't reproduce the problem with QEMU 2.5.0, but I can with any QEMU
  version from 2.6 RC1 on.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1670377/+subscriptions



reply via email to

[Prev in Thread] Current Thread [Next in Thread]