[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] vfio/migrate: Move switch of dirty tracking into vfio_memory
From: |
Keqian Zhu |
Subject: |
Re: [PATCH] vfio/migrate: Move switch of dirty tracking into vfio_memory_listener |
Date: |
Thu, 28 Jan 2021 23:24:13 +0800 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 |
Hi Paolo and Kirti,
Many thanks for reply. I am busy today and will reply you tomorrow, Thanks.
Keqian.
On 2021/1/28 5:03, Kirti Wankhede wrote:
>
>
> On 1/11/2021 1:04 PM, Keqian Zhu wrote:
>> For now the switch of vfio dirty page tracking is integrated into
>> the vfio_save_handler, it causes some problems [1].
>>
>
> Sorry, I missed [1] mail, somehow it didn't landed in my inbox.
>
>> The object of dirty tracking is guest memory, but the object of
>> the vfio_save_handler is device state. This mixed logic produces
>> unnecessary coupling and conflicts:
>>
>> 1. Coupling: Their saving granule is different (perVM vs perDevice).
>> vfio will enable dirty_page_tracking for each devices, actually
>> once is enough.
>
> That's correct, enabling dirty page tracking once is enough. But log_start
> and log_stop gets called on address space update transaction, region_add() or
> region_del(), at this point migration may not be active. We don't want to
> allocate bitmap memory in kernel for lifetime of VM, without knowing
> migration will be happen or not. vfio_iommu_type1 module should allocate
> bitmap memory only while migration is active.
>
> Paolo's suggestion here to use log_global_start and log_global_stop callbacks
> seems correct here. But at this point vfio device state is not yet changed to
> |_SAVING as you had identified it in [1]. May be we can start tracking bitmap
> in iommu_type1 module while device is not yet _SAVING, but getting dirty
> bitmap while device is yet not in _SAVING|_RUNNING state doesn't seem optimal
> solution.
>
> Pasting here your question from [1]
>
>> Before start dirty tracking, we will check and ensure that the device
>> is at _SAVING state and return error otherwise. But the question is
>> that what is the rationale? Why does the VFIO_IOMMU_DIRTY_PAGES
>> ioctl have something to do with the device state?
>
> Lets walk through the types of devices we are supporting:
> 1. mdev devices without IOMMU backed device
> Vendor driver pins pages as and when required during runtime. We can say
> that vendor driver is smart which identifies the pages to pin. We are good
> here.
>
> 2. mdev device with IOMMU backed device
> This is similar to vfio-pci, direct assigned device, where all pages are
> pinned at VM bootup. Vendor driver is not smart, so bitmap query will report
> all pages dirty always. If --auto-converge is not set, VM stucks infinitely
> in pre-copy phase. This is known to us.
>
> 3. mdev device with IOMMU backed device with smart vendor driver
> In this case as well all pages are pinned at VM bootup, but vendor driver
> is smart to identify the pages and pin them explicitly.
> Pages can be pinned anytime, i.e. during normal VM runtime or on setting
> _SAVING flag (entering pre-copy phase) or while in iterative pre-copy phase.
> There is no restriction based on these phases for calling vfio_pin_pages().
> Vendor driver can start pinning pages based on its device state when _SAVING
> flag is set. In that case, if dirty bitmap is queried before that then it
> will report all sysmem as dirty with an unnecessary copy of sysmem.
> As an optimal solution, I think its better to query bitmap only after all
> vfio devices are in pre-copy phase, i.e. after _SAVING flag is set.
>
>> 2. Conflicts: The ram_save_setup() traverses all memory_listeners
>> to execute their log_start() and log_sync() hooks to get the
>> first round dirty bitmap, which is used by the bulk stage of
>> ram saving. However, it can't get dirty bitmap from vfio, as
>> @savevm_ram_handlers is registered before @vfio_save_handler.
>>
> Right, but it can get dirty bitmap from vfio device in it's iterative callback
> ram_save_pending ->
> migration_bitmap_sync_precopy() .. ->
> vfio_listerner_log_sync
>
> Thanks,
> Kirti
>
>> Move the switch of vfio dirty_page_tracking into vfio_memory_listener
>> can solve above problems. Besides, Do not require devices in SAVING
>> state for vfio_sync_dirty_bitmap().
>>
>> [1] https://www.spinics.net/lists/kvm/msg229967.html
>>
>> Reported-by: Zenghui Yu <yuzenghui@huawei.com>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>> hw/vfio/common.c | 53 +++++++++++++++++++++++++++++++++++++--------
>> hw/vfio/migration.c | 35 ------------------------------
>> 2 files changed, 44 insertions(+), 44 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 6ff1daa763..9128cd7ee1 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -311,7 +311,7 @@ bool vfio_mig_active(void)
>> return true;
>> }
>> -static bool vfio_devices_all_saving(VFIOContainer *container)
>> +static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>> {
>> VFIOGroup *group;
>> VFIODevice *vbasedev;
>> @@ -329,13 +329,8 @@ static bool vfio_devices_all_saving(VFIOContainer
>> *container)
>> return false;
>> }
>> - if (migration->device_state & VFIO_DEVICE_STATE_SAVING) {
>> - if ((vbasedev->pre_copy_dirty_page_tracking ==
>> ON_OFF_AUTO_OFF)
>> - && (migration->device_state &
>> VFIO_DEVICE_STATE_RUNNING)) {
>> - return false;
>> - }
>> - continue;
>> - } else {
>> + if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF)
>> + && (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
>> return false;
>> }
>> }
>> @@ -987,6 +982,44 @@ static void vfio_listener_region_del(MemoryListener
>> *listener,
>> }
>> }
>> +static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool
>> start)
>> +{
>> + int ret;
>> + struct vfio_iommu_type1_dirty_bitmap dirty = {
>> + .argsz = sizeof(dirty),
>> + };
>> +
>> + if (start) {
>> + dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
>> + } else {
>> + dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
>> + }
>> +
>> + ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
>> + if (ret) {
>> + error_report("Failed to set dirty tracking flag 0x%x errno: %d",
>> + dirty.flags, errno);
>> + }
>> +}
>> +
>> +static void vfio_listener_log_start(MemoryListener *listener,
>> + MemoryRegionSection *section,
>> + int old, int new)
>> +{
>> + VFIOContainer *container = container_of(listener, VFIOContainer,
>> listener);
>> +
>> + vfio_set_dirty_page_tracking(container, true);
>> +}
>> +
>> +static void vfio_listener_log_stop(MemoryListener *listener,
>> + MemoryRegionSection *section,
>> + int old, int new)
>> +{
>> + VFIOContainer *container = container_of(listener, VFIOContainer,
>> listener);
>> +
>> + vfio_set_dirty_page_tracking(container, false);
>> +}
>> +
>> static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>> uint64_t size, ram_addr_t ram_addr)
>> {
>> @@ -1128,7 +1161,7 @@ static void vfio_listerner_log_sync(MemoryListener
>> *listener,
>> return;
>> }
>> - if (vfio_devices_all_saving(container)) {
>> + if (vfio_devices_all_dirty_tracking(container)) {
>> vfio_sync_dirty_bitmap(container, section);
>> }
>> }
>> @@ -1136,6 +1169,8 @@ static void vfio_listerner_log_sync(MemoryListener
>> *listener,
>> static const MemoryListener vfio_memory_listener = {
>> .region_add = vfio_listener_region_add,
>> .region_del = vfio_listener_region_del,
>> + .log_start = vfio_listener_log_start,
>> + .log_stop = vfio_listener_log_stop,
>> .log_sync = vfio_listerner_log_sync,
>> };
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 00daa50ed8..c0f646823a 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -395,40 +395,10 @@ static int vfio_load_device_config_state(QEMUFile *f,
>> void *opaque)
>> return qemu_file_get_error(f);
>> }
>> -static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start)
>> -{
>> - int ret;
>> - VFIOMigration *migration = vbasedev->migration;
>> - VFIOContainer *container = vbasedev->group->container;
>> - struct vfio_iommu_type1_dirty_bitmap dirty = {
>> - .argsz = sizeof(dirty),
>> - };
>> -
>> - if (start) {
>> - if (migration->device_state & VFIO_DEVICE_STATE_SAVING) {
>> - dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
>> - } else {
>> - return -EINVAL;
>> - }
>> - } else {
>> - dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
>> - }
>> -
>> - ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
>> - if (ret) {
>> - error_report("Failed to set dirty tracking flag 0x%x errno: %d",
>> - dirty.flags, errno);
>> - return -errno;
>> - }
>> - return ret;
>> -}
>> -
>> static void vfio_migration_cleanup(VFIODevice *vbasedev)
>> {
>> VFIOMigration *migration = vbasedev->migration;
>> - vfio_set_dirty_page_tracking(vbasedev, false);
>> -
>> if (migration->region.mmaps) {
>> vfio_region_unmap(&migration->region);
>> }
>> @@ -469,11 +439,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>> return ret;
>> }
>> - ret = vfio_set_dirty_page_tracking(vbasedev, true);
>> - if (ret) {
>> - return ret;
>> - }
>> -
>> qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>> ret = qemu_file_get_error(f);
>>
> .
>