qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 2/2] vfio : add aer process


From: Zhou Jie
Subject: Re: [Qemu-devel] [PATCH v2 2/2] vfio : add aer process
Date: Tue, 2 Aug 2016 09:22:35 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

Hi, Alex

Clearly this has only been tested for a single instance of an AER error
event and resume per device.  Are the things you're intending to block
actually blocked for subsequent events?  Note how complete_all() fills
the done field to let all current and future waiters go through and
nowhere is there a call to reinit_completion() to drain that path.
Thanks,

Alex

Do you mean this condition?

For device 1:
error1 occurs ---- error1 resumes
     error2 occurs ---- error2 resumes
         error3 occurs ---- error3 resumes

In current code, I do complete_all() when error1 resumes.
And this will unblock the device
when error2 and error3 are still be processed.

So walk me through how this works.  On vfio_pci_open() we call
init_completion(), which sets aer_error_completion.done equal to zero
(BTW, a user can open the device file descriptor multiple times, so
there's already a bug here).
I will call init_completion() in vfio_pci_probe.

Let's assume that an error occurs and the
user stalls a single access on wait_for_completion_interruptible().
The bulk of this function happens here:

static inline long __sched
do_wait_for_common(struct completion *x,
                   long (*action)(long), long timeout, int state)
{
        if (!x->done) {
                DECLARE_WAITQUEUE(wait, current);

                __add_wait_queue_tail_exclusive(&x->wait, &wait);
                do {
                        if (signal_pending_state(state, current)) {
                                timeout = -ERESTARTSYS;
                                break;
                        }
                        __set_current_state(state);
                        spin_unlock_irq(&x->wait.lock);
                        timeout = action(timeout);
                        spin_lock_irq(&x->wait.lock);
                } while (!x->done && timeout);
                __remove_wait_queue(&x->wait, &wait);
                if (!x->done)
                        return timeout;
        }
        x->done--;
        return timeout ?: 1;
}

So it waits within that do{}while loop for a completion, interruption,
or timeout.  Then we have:

void complete_all(struct completion *x)
{
        unsigned long flags;

        spin_lock_irqsave(&x->wait.lock, flags);
        x->done += UINT_MAX/2;
        __wake_up_locked(&x->wait, TASK_NORMAL, 0);
        spin_unlock_irqrestore(&x->wait.lock, flags);
}

So aer_error_completion.done gets incremented to let a couple billion
completion waiters through...  Show me how another call to
wait_for_completion_interruptible() will ever block again within our
lifetime when the actual wait of do_wait_for_common() is only entered
when 'done' count is equal to zero.  This seems to be why
reinit_completion() exists, but it's not used here.  Thanks,

Alex

I will call reinit_completion() in vfio_pci_aer_err_detected when
an aer error is detected.
Thank you very much.

Sincerely
ZhouJie






reply via email to

[Prev in Thread] Current Thread [Next in Thread]