qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support


From: Sukrit Bhatnagar
Subject: Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
Date: Sat, 29 Jun 2019 18:15:21 +0530

On Fri, 28 Jun 2019 at 16:56, Dr. David Alan Gilbert
<address@hidden> wrote:
>
> * Yuval Shaia (address@hidden) wrote:
> > On Fri, Jun 21, 2019 at 08:15:41PM +0530, Sukrit Bhatnagar wrote:
> > > Define and register SaveVMHandlers pvrdma_save and
> > > pvrdma_load for saving and loading the device state,
> > > which currently includes only the dma, command slot
> > > and response slot addresses.
> > >
> > > Remap the DSR, command slot and response slot upon
> > > loading the addresses in the pvrdma_load function.
> > >
> > > Cc: Marcel Apfelbaum <address@hidden>
> > > Cc: Yuval Shaia <address@hidden>
> > > Signed-off-by: Sukrit Bhatnagar <address@hidden>
> > > ---
> > >  hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 56 insertions(+)
> > >
> > > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > > index adcf79cd63..cd8573173c 100644
> > > --- a/hw/rdma/vmw/pvrdma_main.c
> > > +++ b/hw/rdma/vmw/pvrdma_main.c
> > > @@ -28,6 +28,7 @@
> > >  #include "sysemu/sysemu.h"
> > >  #include "monitor/monitor.h"
> > >  #include "hw/rdma/rdma.h"
> > > +#include "migration/register.h"
> > >
> > >  #include "../rdma_rm.h"
> > >  #include "../rdma_backend.h"
> > > @@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, 
> > > void *opaque)
> > >      pvrdma_fini(pci_dev);
> > >  }
> > >
> > > +static void pvrdma_save(QEMUFile *f, void *opaque)
> > > +{
> > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > +
> > > +    qemu_put_be64(f, dev->dsr_info.dma);
> > > +    qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
> > > +    qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
> > > +}
> > > +
> > > +static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
> > > +{
> > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > > +
> > > +    // Remap DSR
> > > +    dev->dsr_info.dma = qemu_get_be64(f);
> > > +    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> > > +                                    sizeof(struct 
> > > pvrdma_device_shared_region));
> > > +    if (!dev->dsr_info.dsr) {
> > > +        rdma_error_report("Failed to map to DSR");
> > > +        return -1;
> > > +    }
> > > +    qemu_log("pvrdma_load: successfully remapped to DSR\n");
> > > +
> > > +    // Remap cmd slot
> > > +    dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
> > > +    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, 
> > > dev->dsr_info.dsr->cmd_slot_dma,
> > > +                                     sizeof(union pvrdma_cmd_req));
> > > +    if (!dev->dsr_info.req) {
> >
> > So this is where you hit the error, right?
> > All looks good to me, i wonder why the first pci_dma_map works fine but
> > second fails where the global BounceBuffer is marked as used.
> >
> > Anyone?
>
> I've got a few guesses:
>   a) My reading of address_space_map is that it gives a bounce buffer
> pointer and then it has to be freed; so if it's going wrong and using a
> bounce buffer, then the 1st call would work and only the 2nd would fail.

In the scenario without any migration, the device is init and load_dsr()
is called when the guest writes to DSR in BAR1 of pvrdma.

Inside the load_dsr(), there are similar calls to rdma_pci_dma_map().

The difference is that when the address_space_map() is called there,
!memory_access_is_direct() condition is FALSE.
So, there is no use for BounceBuffer.


In the migration scenario, when the device at dest calls load and
subsequently rdma_pci_dma_map(), the !memory_access_is_direct()
condition is TRUE.
So, the first rdma_pci_dma_map() will set BounceBuffer from FALSE to
TRUE and succeed. But, the subsequent calls will fail at atomic_xchg().

This BounceBuffer is only freed when address_space_unmap() is called.


I am guessing that when the address is translated to one in FlatView,
the MemoryRegion returned at dest is causing the issue.
To be honest, at this point I am not sure how FlatView translations works.

I tried adding qemu_log to memory_access_is_direct(), but I guess it is
called too many times, so the vm stalls before it even starts.

>   b) Perhaps the dma address read from the stream is bad, and isn't
> pointing into RAM properly - and that's why you're getting a bounce
> buffer.

I have logged the addresses saved in pvrdma_save(), and the ones
loaded in pvrdma_load(). They are same. So, there is no problem in the
stream itself, I suppose.

>   c) Do you have some ordering problems? i.e. is this code happening
> before some other device has been loaded/initialised?  If you're relying
> on some other state, then perhaps the right thing is to perform these
> mappings later, at the end of migration, not during the loading itself.

The device which is saved/loaded before pvrdma is vmxnet3, on which
the pvrdma depends.

I have included the last few lines of my traces during migration below.
The pvrdma migration is performed in this last part of migration.
I had added some debug messages at various places to see which parts
of the code are touched. The ones I added are without any timestamp.

Source:
15942@1561806657.197239:vmstate_subsection_save_top PCIDevice
15942@1561806657.197257:vmstate_subsection_save_top vmxnet3/pcie
15942@1561806657.197275:savevm_section_end 0000:00:10.0/vmxnet3,
section_id 46 -> 0
15942@1561806657.197302:savevm_section_start 0000:00:10.1/pvrdma, section_id 47
15942@1561806657.197326:vmstate_save 0000:00:10.1/pvrdma, (old)
pvrdma_save: dev->dsr_info.dma is 2032963584
pvrdma_save: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
pvrdma_save: dev->dsr_info.dsr->resp_slot_dma is 893681664
15942@1561806657.197372:savevm_section_end 0000:00:10.1/pvrdma,
section_id 47 -> 0
15942@1561806657.197397:savevm_section_start acpi_build, section_id 48
15942@1561806657.197420:vmstate_save acpi_build, acpi_build
15942@1561806657.197441:vmstate_save_state_top acpi_build
15942@1561806657.197459:vmstate_n_elems patched: 1
15942@1561806657.197479:vmstate_save_state_loop acpi_build/patched[1]
15942@1561806657.197503:vmstate_subsection_save_top acpi_build
15942@1561806657.197520:savevm_section_end acpi_build, section_id 48 -> 0
15942@1561806657.197545:savevm_section_start globalstate, section_id 49
15942@1561806657.197568:vmstate_save globalstate, globalstate
15942@1561806657.197589:vmstate_save_state_top globalstate
15942@1561806657.197606:migrate_global_state_pre_save saved state: running
15942@1561806657.197624:vmstate_save_state_pre_save_res globalstate/0
15942@1561806657.197645:vmstate_n_elems size: 1
15942@1561806657.197677:vmstate_save_state_loop globalstate/size[1]
15942@1561806657.197710:vmstate_n_elems runstate: 1
15942@1561806657.197730:vmstate_save_state_loop globalstate/runstate[1]
15942@1561806657.197822:vmstate_subsection_save_top globalstate
15942@1561806657.197885:savevm_section_end globalstate, section_id 49 -> 0
15942@1561806657.198240:migrate_set_state new state completed
15942@1561806657.198269:migration_thread_after_loop
15797@1561806657.198329:savevm_state_cleanup
15797@1561806657.198377:migrate_fd_cleanup
15797@1561806657.199072:qemu_file_fclose

Destination:
15830@1561806657.667024:vmstate_subsection_load vmxnet3/pcie
15830@1561806657.667064:vmstate_subsection_load_good vmxnet3/pcie
15830@1561806657.667106:vmstate_load_state_end vmxnet3/pcie end/0
15830@1561806657.667150:vmstate_subsection_load_good vmxnet3
15830@1561806657.667213:vmstate_load_state_end vmxnet3 end/0
15830@1561806657.667263:qemu_loadvm_state_section 4
15830@1561806657.667293:qemu_loadvm_state_section_startfull
47(0000:00:10.1/pvrdma) 0 1
15830@1561806657.667346:vmstate_load 0000:00:10.1/pvrdma, (old)
pvrdma_load: dev->dsr_info.dma is 2032963584
address_space_map: is_write is 0
address_space_map: addr is 2032963584
address_space_map: Inside !memory_access_is_direct
15830@1561806657.667453:rdma_pci_dma_map 0x792c9000 -> 0x5616d1f66000 (len=280)
pvrdma_load: successfully remapped to DSR
pvrdma_load: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
address_space_map: is_write is 0
address_space_map: addr is 2032660480
address_space_map: Inside !memory_access_is_direct
address_space_map: Inside atomic_xchg
qemu-system-x86_64: rdma: pci_dma_map fail, addr=0x7927f000, len=184
qemu-system-x86_64: rdma: Failed to map to command slot address
qemu-system-x86_64: error while loading state for instance 0x0 of
device '0000:00:10.1/pvrdma'
15830@1561806657.667693:qemu_loadvm_state_post_main -1
15830@1561806657.667729:loadvm_state_cleanup
15830@1561806657.668568:process_incoming_migration_co_end ret=-1
postcopy-state=0
qemu-system-x86_64: load of migration failed: Operation not permitted
15830@1561806657.668721:migrate_set_state new state failed
15830@1561806657.668807:qemu_file_fclose
qemu-system-x86_64: network script /etc/qemu-ifdown failed with status 256

@Marcel, @Yuval: As David has suggested, what if we just read the dma
addresses in pvrdma_load(), and let the load_dsr() do the mapping?
In pvrdma_regs_write(), we can check if dev->dsr_info.dma is already set, so
that its value is not overwritten.


> Other notes:
>   d) Use VMSTATE macros and structures rather than open-coding qemu_get
> calls.

Yes, I am trying it currently.

>   e) QEMU normally uses /* comments */ rather than //

I have corrected this mistake in my code. patchew notified me about this
and the line length issues minutes after I had sent this patch.


> Dave
>
> > > +        rdma_error_report("Failed to map to command slot address");
> > > +        return -1;
> > > +    }
> > > +    qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
> > > +
> > > +    // Remap rsp slot
> >
> > btw, this is RFC so we are fine but try to avoid such way of comments.
> >
> > > +    dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
> > > +    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, 
> > > dev->dsr_info.dsr->resp_slot_dma,
> > > +                                     sizeof(union pvrdma_cmd_resp));
> > > +    if (!dev->dsr_info.rsp) {
> > > +        rdma_error_report("Failed to map to response slot address");
> > > +        return -1;
> > > +    }
> > > +    qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static SaveVMHandlers savevm_pvrdma = {
> > > +    .save_state = pvrdma_save,
> > > +    .load_state = pvrdma_load,
> > > +};
> > > +
> > >  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > >  {
> > >      int rc = 0;
> > > +    DeviceState *s = DEVICE(pdev);
> > >      PVRDMADev *dev = PVRDMA_DEV(pdev);
> > >      Object *memdev_root;
> > >      bool ram_shared = false;
> > > @@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error 
> > > **errp)
> > >      dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
> > >      qemu_register_shutdown_notifier(&dev->shutdown_notifier);
> > >
> > > +    register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev);
> > > +
> > >  out:
> > >      if (rc) {
> > >          pvrdma_fini(pdev);
> > > --
> > > 2.21.0
> > >
> > >
> >
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]