qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests
Date: Tue, 27 Jan 2015 15:33:12 +1100
User-agent: Mutt/1.5.23 (2014-03-12)

On Mon, Jan 05, 2015 at 05:13:50PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (address@hidden) wrote:
> > On Fri, Oct 03, 2014 at 06:47:50PM +0100, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" <address@hidden>
> > > 
> > > userfaultfd is a Linux syscall that gives an fd that receives a stream
> > > of notifications of accesses to pages marked as MADV_USERFAULT, and
> > > allows the program to acknowledge those stalls and tell the accessing
> > > thread to carry on.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <address@hidden>
> > 
> > [snip]
> > >  /*
> > > + * Tell the kernel that we've now got some memory it previously asked 
> > > for.
> > > + * Note: We're not allowed to ack a page which wasn't requested.
> > > + */
> > > +static int ack_userfault(MigrationIncomingState *mis, void *start, 
> > > size_t len)
> > > +{
> > > +    uint64_t tmp[2];
> > > +
> > > +    /*
> > > +     * Kernel wants the range that's now safe to access
> > > +     * Note it always takes 64bit values, even on a 32bit host.
> > > +     */
> > > +    tmp[0] = (uint64_t)(uintptr_t)start;
> > > +    tmp[1] = (uint64_t)(uintptr_t)start + (uint64_t)len;
> > > +
> > > +    if (write(mis->userfault_fd, tmp, 16) != 16) {
> > > +        int e = errno;
> > 
> > Is an EOF (i.e. write() returns 0) ever possible here?  If so errno
> > may not have a meaningful value.
> 
> I don't think so; I think any !=16 case is an error; however if I understand
> correctly the safe thing to do is for me to do an 
> 
> errno = 0
> 
> before the call.

Either that, or handle unexpected EOF / short write as a different case.

> 
> > 
> > > +        if (e == ENOENT) {
> > > +            /* Kernel said it wasn't waiting - one case where this can
> > > +             * happen is where two threads triggered the userfault
> > > +             * and we receive the page and ack it just after we received
> > > +             * the 2nd request and that ends up deciding it should ack it
> > > +             * We could optimise it out, but it's rare.
> > > +             */
> > > +            /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, 
> > > len); */
> > > +            return 0;
> > > +        }
> > > +        error_report("postcopy_ram: Failed to notify kernel for %p/%zx 
> > > (%d)",
> > > +                     start, len, e);
> > > +        return -errno;
> 
> Hmm, and made that    return -e

Ah, yes, otherwise it's very likely that error_report() will clobber
the value.

> > > +/*
> > >   * Handle faults detected by the USERFAULT markings
> > >   */
> > >  static void *postcopy_ram_fault_thread(void *opaque)
> > >  {
> > >      MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
> > > +    void *hostaddr;
> > > +    int ret;
> > > +    size_t hostpagesize = getpagesize();
> > > +    RAMBlock *rb = NULL;
> > > +    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
> > >  
> > > -    fprintf(stderr, "postcopy_ram_fault_thread\n");
> > > -    /* TODO: In later patch */
> > > +    DPRINTF("%s", __func__);
> > >      qemu_sem_post(&mis->fault_thread_sem);
> > > -    while (1) {
> > > -        /* TODO: In later patch */
> > > -    }
> > > +    while (true) {
> > > +        PostcopyPMIState old_state, tmp_state;
> > > +        ram_addr_t rb_offset;
> > > +        ram_addr_t in_raspace;
> > > +        unsigned long bitmap_index;
> > > +        struct pollfd pfd[2];
> > > +
> > > +        /*
> > > +         * We're mainly waiting for the kernel to give us a faulting HVA,
> > > +         * however we can be told to quit via userfault_quit_fd which is
> > > +         * an eventfd
> > > +         */
> > > +        pfd[0].fd = mis->userfault_fd;
> > > +        pfd[0].events = POLLIN;
> > > +        pfd[0].revents = 0;
> > > +        pfd[1].fd = mis->userfault_quit_fd;
> > > +        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
> > > +        pfd[1].revents = 0;
> > > +
> > > +        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
> > > +            perror("userfault poll");
> > > +            break;
> > > +        }
> > >  
> > > +        if (pfd[1].revents) {
> > > +            DPRINTF("%s got quit event", __func__);
> > > +            break;
> > 
> > I don't see any cleanup path in the userfault thread.  So wouldn't it
> > be simpler to just pthread_cancel() it rather than using an extra fd
> > for quit notifications.
> 
> But it does call functions that take locks (both the pmi and the
> return path qemu-file), so I don't feel comfortable just cancelling the
> thread.

Ah, good point.  Use of an event restrict the points at which the
thread can exit, which is significant.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: pgp0fDmMArFp6.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]