qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 00/13] Add migration support for VFIO device


From: Yan Zhao
Subject: Re: [Qemu-devel] [PATCH v4 00/13] Add migration support for VFIO device
Date: Fri, 21 Jun 2019 06:45:09 -0400
User-agent: Mutt/1.9.4 (2018-02-28)

On Fri, Jun 21, 2019 at 05:22:37PM +0800, Kirti Wankhede wrote:
> 
> 
> On 6/21/2019 2:16 PM, Yan Zhao wrote:
> > On Fri, Jun 21, 2019 at 04:02:50PM +0800, Kirti Wankhede wrote:
> >>
> >>
> >> On 6/21/2019 6:54 AM, Yan Zhao wrote:
> >>> On Fri, Jun 21, 2019 at 08:25:18AM +0800, Yan Zhao wrote:
> >>>> On Thu, Jun 20, 2019 at 10:37:28PM +0800, Kirti Wankhede wrote:
> >>>>> Add migration support for VFIO device
> >>>>>
> >>>>> This Patch set include patches as below:
> >>>>> - Define KABI for VFIO device for migration support.
> >>>>> - Added save and restore functions for PCI configuration space
> >>>>> - Generic migration functionality for VFIO device.
> >>>>>   * This patch set adds functionality only for PCI devices, but can be
> >>>>>     extended to other VFIO devices.
> >>>>>   * Added all the basic functions required for pre-copy, stop-and-copy 
> >>>>> and
> >>>>>     resume phases of migration.
> >>>>>   * Added state change notifier and from that notifier function, VFIO
> >>>>>     device's state changed is conveyed to VFIO device driver.
> >>>>>   * During save setup phase and resume/load setup phase, migration 
> >>>>> region
> >>>>>     is queried and is used to read/write VFIO device data.
> >>>>>   * .save_live_pending and .save_live_iterate are implemented to use 
> >>>>> QEMU's
> >>>>>     functionality of iteration during pre-copy phase.
> >>>>>   * In .save_live_complete_precopy, that is in stop-and-copy phase,
> >>>>>     iteration to read data from VFIO device driver is implemented till 
> >>>>> pending
> >>>>>     bytes returned by driver are not zero.
> >>>>>   * Added function to get dirty pages bitmap for the pages which are 
> >>>>> used by
> >>>>>     driver.
> >>>>> - Add vfio_listerner_log_sync to mark dirty pages.
> >>>>> - Make VFIO PCI device migration capable. If migration region is not 
> >>>>> provided by
> >>>>>   driver, migration is blocked.
> >>>>>
> >>>>> Below is the flow of state change for live migration where states in 
> >>>>> brackets
> >>>>> represent VM state, migration state and VFIO device state as:
> >>>>>     (VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE)
> >>>>>
> >>>>> Live migration save path:
> >>>>>         QEMU normal running state
> >>>>>         (RUNNING, _NONE, _RUNNING)
> >>>>>                         |
> >>>>>     migrate_init spawns migration_thread.
> >>>>>     (RUNNING, _SETUP, _RUNNING|_SAVING)
> >>>>>     Migration thread then calls each device's .save_setup()
> >>>>>                         |
> >>>>>     (RUNNING, _ACTIVE, _RUNNING|_SAVING)
> >>>>>     If device is active, get pending bytes by .save_live_pending()
> >>>>>     if pending bytes >= threshold_size,  call save_live_iterate()
> >>>>>     Data of VFIO device for pre-copy phase is copied.
> >>>>>     Iterate till pending bytes converge and are less than threshold
> >>>>>                         |
> >>>>>     On migration completion, vCPUs stops and calls 
> >>>>> .save_live_complete_precopy
> >>>>>     for each active device. VFIO device is then transitioned in
> >>>>>      _SAVING state.
> >>>>>     (FINISH_MIGRATE, _DEVICE, _SAVING)
> >>>>>     For VFIO device, iterate in  .save_live_complete_precopy  until
> >>>>>     pending data is 0.
> >>>>>     (FINISH_MIGRATE, _DEVICE, _STOPPED)
> >>>>
> >>>> I suggest we also register to VMStateDescription, whose .pre_save
> >>>> handler would get called after .save_live_complete_precopy in pre-copy
> >>>> only case, and will called before .save_live_iterate in post-copy
> >>>> enabled case.
> >>>> In the .pre_save handler, we can save all device state which must be
> >>>> copied after device stop in source vm and before device start in target 
> >>>> vm.
> >>>>
> >>> hi
> >>> to better describe this idea:
> >>>
> >>> in pre-copy only case, the flow is
> >>>
> >>> start migration --> .save_live_iterate (several round) -> stop source vm
> >>> --> .save_live_complete_precopy --> .pre_save  -->start target vm
> >>> -->migration complete
> >>>
> >>>
> >>> in post-copy enabled case, the flow is
> >>>
> >>> start migration --> .save_live_iterate (several round) --> start post 
> >>> copy --> 
> >>> stop source vm --> .pre_save --> start target vm --> .save_live_iterate 
> >>> (several round) 
> >>> -->migration complete
> >>>
> >>> Therefore, we should put saving of device state in .pre_save interface
> >>> rather than in .save_live_complete_precopy. 
> >>> The device state includes pci config data, page tables, register state, 
> >>> etc.
> >>>
> >>> The .save_live_iterate and .save_live_complete_precopy should only deal
> >>> with saving dirty memory.
> >>>
> >>
> >> Vendor driver can decide when to save device state depending on the VFIO
> >> device state set by user. Vendor driver doesn't have to depend on which
> >> callback function QEMU or user application calls. In pre-copy case,
> >> save_live_complete_precopy sets VFIO device state to
> >> VFIO_DEVICE_STATE_SAVING which means vCPUs are stopped and vendor driver
> >> should save all device state.
> >>
> > when post copy stops vCPUs and vfio device, vendor driver only needs to
> > provide device state. but how vendor driver knows that, if no extra
> > interface or no extra device state is provides?
> > 
> 
> .save_live_complete_postcopy interface for post-copy will get called,
> right?
>
yes, but it's too late after postcopy completion

> Thanks,
> Kirti
> 
> >>>
> >>> I know current implementation does not support post-copy. but at least
> >>> it should not require huge change when we decide to enable it in future.
> >>>
> >>
> >> .has_postcopy and .save_live_complete_postcopy need to be implemented to
> >> support post-copy. I think .save_live_complete_postcopy should be
> >> similar to vfio_save_complete_precopy.
> >>
> >> Thanks,
> >> Kirti
> >>
> >>> Thanks
> >>> Yan
> >>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]