[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V1 00/32] Live Update
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH V1 00/32] Live Update |
Date: |
Tue, 11 Aug 2020 20:08:30 +0100 |
User-agent: |
Mutt/1.14.6 (2020-07-11) |
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Fri, Jul 31, 2020 at 11:27:45AM -0400, Steven Sistare wrote:
> > On 7/31/2020 4:53 AM, Daniel P. Berrangé wrote:
> > > On Thu, Jul 30, 2020 at 02:48:44PM -0400, Steven Sistare wrote:
> > >> On 7/30/2020 12:52 PM, Daniel P. Berrangé wrote:
> > >>> On Thu, Jul 30, 2020 at 08:14:04AM -0700, Steve Sistare wrote:
> > >>>> Improve and extend the qemu functions that save and restore VM state
> > >>>> so a
> > >>>> guest may be suspended and resumed with minimal pause time. qemu may
> > >>>> be
> > >>>> updated to a new version in between.
> > >>>>
> > >>>> The first set of patches adds the cprsave and cprload commands to save
> > >>>> and
> > >>>> restore VM state, and allow the host kernel to be updated and rebooted
> > >>>> in
> > >>>> between. The VM must create guest RAM in a persistent shared memory
> > >>>> file,
> > >>>> such as /dev/dax0.0 or persistant /dev/shm PKRAM as proposed in
> > >>>> https://lore.kernel.org/lkml/1588812129-8596-1-git-send-email-anthony.yznaga@oracle.com/
> > >>>>
> > >>>> cprsave stops the VCPUs and saves VM device state in a simple file, and
> > >>>> thus supports any type of guest image and block device. The caller
> > >>>> must
> > >>>> not modify the VM's block devices between cprsave and cprload.
> > >>>>
> > >>>> cprsave and cprload support guests with vfio devices if the caller
> > >>>> first
> > >>>> suspends the guest by issuing guest-suspend-ram to the qemu guest
> > >>>> agent.
> > >>>> The guest drivers suspend methods flush outstanding requests and re-
> > >>>> initialize the devices, and thus there is no device state to save and
> > >>>> restore.
> > >>>>
> > >>>> 1 savevm: add vmstate handler iterators
> > >>>> 2 savevm: VM handlers mode mask
> > >>>> 3 savevm: QMP command for cprsave
> > >>>> 4 savevm: HMP Command for cprsave
> > >>>> 5 savevm: QMP command for cprload
> > >>>> 6 savevm: HMP Command for cprload
> > >>>> 7 savevm: QMP command for cprinfo
> > >>>> 8 savevm: HMP command for cprinfo
> > >>>> 9 savevm: prevent cprsave if memory is volatile
> > >>>> 10 kvmclock: restore paused KVM clock
> > >>>> 11 cpu: disable ticks when suspended
> > >>>> 12 vl: pause option
> > >>>> 13 gdbstub: gdb support for suspended state
> > >>>>
> > >>>> The next patches add a restart method that eliminates the persistent
> > >>>> memory
> > >>>> constraint, and allows qemu to be updated across the restart, but does
> > >>>> not
> > >>>> allow host reboot. Anonymous memory segments used by the guest are
> > >>>> preserved across a re-exec of qemu, mapped at the same VA, via a
> > >>>> proposed
> > >>>> madvise(MADV_DOEXEC) option in the Linux kernel. See
> > >>>> https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
> > >>>>
> > >>>> 14 savevm: VMS_RESTART and cprsave restart
> > >>>> 15 vl: QEMU_START_FREEZE env var
> > >>>> 16 oslib: add qemu_clr_cloexec
> > >>>> 17 util: env var helpers
> > >>>> 18 osdep: import MADV_DOEXEC
> > >>>> 19 memory: ram_block_add cosmetic changes
> > >>>> 20 vl: add helper to request re-exec
> > >>>> 21 exec, memory: exec(3) to restart
> > >>>> 22 char: qio_channel_socket_accept reuse fd
> > >>>> 23 char: save/restore chardev socket fds
> > >>>> 24 ui: save/restore vnc socket fds
> > >>>> 25 char: save/restore chardev pty fds
> > >>>
> > >>> Keeping FDs open across re-exec is a nice trick, but how are you dealing
> > >>> with the state associated with them, most especially the TLS encryption
> > >>> state ? AFAIK, there's no way to serialize/deserialize the TLS state
> > >>> that
> > >>> GNUTLS maintains, and the patches don't show any sign of dealing with
> > >>> this. IOW it looks like while the FD will be preserved, any TLS session
> > >>> running on it will fail.
> > >>
> > >> I had not considered TLS. If a non-qemu library maintains connection
> > >> state, then
> > >> we won't be able to support it for live update until the library
> > >> provides interfaces
> > >> to serialize the state.
> > >>
> > >> For qemu objects, so far vmstate has been adequate to represent the
> > >> devices with
> > >> descriptors that we preserve.
> > >
> > > My main concern about this series is that there is an implicit assumption
> > > that QEMU is *not* configured with certain features that are not handled
> > > If QEMU is using one of the unsupported features, I don't see anything in
> > > the series which attempts to prevent the actions.
> > >
> > > IOW, users can have an arbitrary QEMU config, attempt to use these new
> > > features,
> > > the commands may well succeed, but the user is silently left with a
> > > broken QEMU.
> > > Such silent failure modes are really undesirable as they'll lead to a
> > > never
> > > ending stream of hard to diagnose bug reports for QEMU maintainers.
> > >
> > > TLS is one example of this, the live upgrade will "succeed", but the TLS
> > > connections will be totally non-functional.
> >
> > I agree with all your points and would like to do better in this area.
> > Other than hunting for
> > every use of a descriptor and either supporting it or blocking cpr, do you
> > have any suggestions?
> > Thinking out loud, maybe we can gather all the fds that we support, then
> > look for all fds in the
> > process, and block the cpr if we find an unrecognized fd.
>
> There's no magic easy answer to this problem. Conceptually it is similar to
> the problem of reliably migrating guest device state, but in this case we're
> primarily concerned about the backends instead.
>
> For migration we've got standardized interfaces that devices must implement
> in order to correctly support migration serialization. There is also support
> for devices to register migration "blockers" which prevent any use of the
> migration feature when the device is present.
>
> We lack this kind of concept for the backend, and that's what I think needs
> to be tackled in a more thorough way. There are quite alot of backends,
> but they're grouped into a reasonable small number of sets (UIs, chardevs,
> blockdevs, net devs, etc). We need some standard interface that we can
> plumb into all the backends, along with providing backends the ability to
> block the re-exec. If we plumb the generic infrastructure into each of the
> different types of backend, and make the default behaviour be to reject
> the re-exec. Then we need to carefull consider specific backend impls
> and allow the re-exec only in the very precise cases we can demonstrate
> to be safe.
>
> IOW, have a presumption that re-exec will *not* be permitted. Over time
> we can make it work for an ever expanding set of use cases.
Yes, it does feel like an interface that needs to be implemented on the
chardev; then you don't need to worry about handling them all
individually.
Dave
>
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK