qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user


From: Raphael Norwitz
Subject: Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user
Date: Thu, 20 Feb 2020 02:03:08 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

Ping

On Sun, Feb 09, 2020 at 12:43:35PM -0500, Raphael Norwitz wrote:
> 
> On Thu, Feb 06, 2020 at 03:32:38AM -0500, Michael S. Tsirkin wrote:
> > 
> > On Wed, Jan 15, 2020 at 09:57:06PM -0500, Raphael Norwitz wrote:
> > > The current vhost-user implementation in Qemu imposes a limit on the
> > > maximum number of memory slots exposed to a VM using a vhost-user
> > > device. This change provides a new protocol feature
> > > VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
> > > allows a VM with a vhost-user device to expose a configurable number of
> > > memory slots, up to the ACPI defined maximum. Existing backends which
> > > do not support this protocol feature are unaffected.
> > 
> > Hmm ACPI maximum seems to be up to 512 - is this too much to fit in a
> > single message?  So can't we just increase the number (after negotiating
> > with remote) and be done with it, instead of add/remove?  Or is there
> > another reason to prefer add/remove?
> >
> 
> As mentioned in my cover letter, we experimented with simply increasing the
> message size and it didn’t work on our setup. We debugged down to the socket
> layer and found that on the receiving end the messages were truncated at
> around 512 bytes, or around 16 memory regions. To support 512 memory regions
> we would need a message size of around  32 <bytes per region> * 512 <regions>
> + 8 <bytes for padding and region count> ~= 16k packet size. That would be 64
> times larger than the next largest message size. We thought it would be 
> cleaner
> and more in line with the rest of the protocol to keep the message sizes
> smaller. In particular, we thought memory regions should be treated like the
> rings, which are sent over one message at a time instead of in one large 
> message.
> Whether or not such a large message size can be made to work in our case,
> separate messages will always work on Linux, and most likely all other UNIX
> platforms QEMU is used on.
> 

> > > 
> > > This feature works by using three new messages,
> > > VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
> > > VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
> > > number of memory slots the backend is willing to accept when the
> > > backend is initialized. Then, when the memory tables are set or updated,
> > > a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
> > > are sent to transmit the regions to map and/or unmap instead of trying
> > > to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
> > > message.
> > > 
> > > The vhost_user struct maintains a shadow state of the VM’s memory
> > > regions. When the memory tables are modified, the
> > > vhost_user_set_mem_table() function compares the new device memory state
> > > to the shadow state and only sends regions which need to be unmapped or
> > > mapped in. The regions which must be unmapped are sent first, followed
> > > by the new regions to be mapped in. After all the messages have been
> > > sent, the shadow state is set to the current virtual device state.
> > > 
> > > The current feature implementation does not work with postcopy migration
> > > and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> > > also been negotiated.
> > 
> > Hmm what would it take to lift the restrictions?
> > conflicting features like this makes is very hard for users to make
> > an informed choice what to support.
> >
> 
> We would need a setup with a backend which supports these features (REPLY_ACK
> and postcopy migration). At first glance it looks like DPDK could work but
> I'm not sure how easy it will be to test postcopy migration with the resources
> we have.
>  

> > > Signed-off-by: Raphael Norwitz <address@hidden>
> > > Signed-off-by: Peter Turschmid <address@hidden>
> > > Suggested-by: Mike Cui <address@hidden>
> > > ---
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index af83fdd..fed6d02 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -35,11 +35,29 @@
> > >  #include <linux/userfaultfd.h>
> > >  #endif
> > >  
> > > -#define VHOST_MEMORY_MAX_NREGIONS    8
> > > +#define VHOST_MEMORY_LEGACY_NREGIONS    8
> > 
> > Hardly legacy when this is intended to always be used e.g. with
> > postcopy, right?
> >
> 
> How about 'BASELINE'?

> > > +    msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
> > > +    msg->hdr.size += sizeof(VhostUserMemoryRegion);
> > > +
> > > +    /*
> > > +     * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
> > > +     * which are not found not in the device's memory state.
> > 
> > double negation - could not parse this.
> >
> 
> Apologies - typo here. It should say “Send VHOST_USER_REM_MEM_REG for memory
> regions in our shadow state which are not found in the device's memory 
> state.” 
> i.e. send messages to remove regions in the shadow state but not in the 
> updated
> device state. 
>  
> > > +     */
> > > +    for (i = 0; i < u->num_shadow_regions; ++i) {
> > > +        reg = dev->mem->regions;
> > > +
> > > +        for (j = 0; j < dev->mem->nregions; j++) {
> > > +            reg = dev->mem->regions + j;
> > > +
> > > +            assert((uintptr_t)reg->userspace_addr == 
> > > reg->userspace_addr);
> > > +            mr = memory_region_from_host((void 
> > > *)(uintptr_t)reg->userspace_addr,
> > > +                                         &offset);
> > > +            fd = memory_region_get_fd(mr);
> > > +
> > > +            if (reg_equal(&u->shadow_regions[i], reg)) {
> > > +                matching = true;
> > > +                found[j] = true;
> > > +                break;
> > > +            }
> > > +        }
> > > +
> > > +        if (fd > 0 && !matching) {
> > > +            msg->hdr.request = VHOST_USER_REM_MEM_REG;
> > > +            msg->payload.mem_reg.region.userspace_addr = 
> > > reg->userspace_addr;
> > > +            msg->payload.mem_reg.region.memory_size = reg->memory_size;
> > > +            msg->payload.mem_reg.region.guest_phys_addr =
> > > +                reg->guest_phys_addr;
> > > +            msg->payload.mem_reg.region.mmap_offset = offset;
> > > +
> > > +            if (vhost_user_write(dev, msg, &fd, 1) < 0) {
> > > +                return -1;
> > > +            }
> > > +        }
> > > +    }
> > > +



reply via email to

[Prev in Thread] Current Thread [Next in Thread]