qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in ai


From: Fam Zheng
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH RFC 4/4] aio-posix: Use epoll in aio_poll
Date: Wed, 8 Jul 2015 09:01:27 +0800
User-agent: Mutt/1.5.23 (2014-03-12)

On Tue, 07/07 16:08, Stefan Hajnoczi wrote:
> > @@ -44,6 +47,12 @@ static AioHandler *find_aio_handler(AioContext *ctx, int 
> > fd)
> >  
> >  void aio_context_setup(AioContext *ctx, Error **errp)
> >  {
> > +#ifdef CONFIG_EPOLL
> > +    ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
> > +    if (ctx->epollfd < 0) {
> > +        error_setg(errp, "Failed to create epoll fd: %s", strerror(errno));
> 
> Slightly more concise:
> error_setg_errno(errp, errno, "Failed to create epoll fd")

Okay.

> 
> > -/* These thread-local variables are used only in a small part of aio_poll
> > +#ifdef CONFIG_EPOLL
> > +QEMU_BUILD_BUG_ON((int)G_IO_IN != EPOLLIN);
> > +QEMU_BUILD_BUG_ON((int)G_IO_OUT != EPOLLOUT);
> > +QEMU_BUILD_BUG_ON((int)G_IO_PRI != EPOLLPRI);
> > +QEMU_BUILD_BUG_ON((int)G_IO_ERR != EPOLLERR);
> > +QEMU_BUILD_BUG_ON((int)G_IO_HUP != EPOLLHUP);
> 
> I guess this assumption is okay but maybe the compiler optimizes:
> 
>   event.events = (node->pfd.events & G_IO_IN ? EPOLLIN : 0) |
>                  (node->pfd.events & G_IO_OUT ? EPOLLOUT : 0) |
>                (node->pfd.events & G_IO_PRI ? EPOLLPRI : 0) |
>                (node->pfd.events & G_IO_ERR ? EPOLLERR : 0) |
>                (node->pfd.events & G_IO_HUP ? EPOLLHUP : 0);
> 
> into:
> 
>   events.events = node->pfd.events & (EPOLLIN | EPOLLOUT | EPOLLPRI |
>                                       EPOLLERR | EPOLLHUP);
> 
> which is just an AND instruction so it's effectively free and doesn't
> assume that these constants have the same values.

Okay, it'll be a few more typing (convert to and back) but more straigtforward
and self-documenting.

> 
> > +
> > +#define EPOLL_BATCH 128
> > +static bool aio_poll_epoll(AioContext *ctx, bool blocking)
> > +{
> > +    AioHandler *node;
> > +    bool was_dispatching;
> > +    int i, ret;
> > +    bool progress;
> > +    int64_t timeout;
> > +    struct epoll_event events[EPOLL_BATCH];
> > +
> > +    aio_context_acquire(ctx);
> > +    was_dispatching = ctx->dispatching;
> > +    progress = false;
> > +
> > +    /* aio_notify can avoid the expensive event_notifier_set if
> > +     * everything (file descriptors, bottom halves, timers) will
> > +     * be re-evaluated before the next blocking poll().  This is
> > +     * already true when aio_poll is called with blocking == false;
> > +     * if blocking == true, it is only true after poll() returns.
> > +     *
> > +     * If we're in a nested event loop, ctx->dispatching might be true.
> > +     * In that case we can restore it just before returning, but we
> > +     * have to clear it now.
> > +     */
> > +    aio_set_dispatching(ctx, !blocking);
> > +
> > +    ctx->walking_handlers++;
> > +
> > +    timeout = blocking ? aio_compute_timeout(ctx) : 0;
> > +
> > +    if (timeout > 0) {
> > +        timeout = DIV_ROUND_UP(timeout, 1000000);
> > +    }
> 
> I think you already posted the timerfd code in an earlier series.  Why
> degrade to millisecond precision?  It needs to be fixed up anyway if the
> main loop uses aio_poll() in the future.

Because of a little complication: timeout here is always -1 for iothread, and
what is interesting is that -1 actually requires an explicit

    timerfd_settime(timerfd, flags, &(struct itimerspec){{0, 0}}, NULL)

to disable timerfd for this aio_poll(), which costs somethings. Passing -1 to
epoll_wait() without this doesn't work because the timerfd is already added to
the epollfd and may have an unexpected timeout set before.

Of course we can cache the state and optimize, but I've not reasoned about what
if another thread happens to call aio_poll() when we're in epoll_wait(), for
example when the first aio_poll() has a positive timeout but the second one has
-1.

Fam




reply via email to

[Prev in Thread] Current Thread [Next in Thread]