qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 02/10] main-loop: switch to g_poll() on POSIX


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH v3 02/10] main-loop: switch to g_poll() on POSIX hosts
Date: Mon, 04 Feb 2013 15:37:39 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130108 Thunderbird/10.0.12

Comments in-line.

On 02/04/13 13:12, Stefan Hajnoczi wrote:
> Use g_poll(3) instead of select(2).  Well, this is kind of a cheat.
> It's true that we're now using g_poll(3) on POSIX hosts but the *_fill()
> and *_poll() functions are still using rfds/wfds/xfds.
>
> We've set the scene to start converting *_fill() and *_poll() functions
> step-by-step until no more rfds/wfds/xfds users remain.  Then we'll drop
> the temporary gpollfds_from_select() and gpollfds_to_select() functions
> and be left with native g_poll(2).
>
> On Windows things are a little crazy: convert from rfds/wfds/xfds to
> GPollFDs, back to rfds/wfds/xfds, call select(2), rfds/wfds/xfds back to
> GPollFDs, and finally back to rfds/wfds/xfds again.  This is only
> temporary and keeps the Windows build working through the following
> patches.  We'll drop this excessive conversion later and be left with a
> single GPollFDs -> select(2) -> GPollFDs sequence that allows Windows to
> use select(2) while the rest of QEMU only knows about GPollFD.
>
> Signed-off-by: Stefan Hajnoczi <address@hidden>
> ---
>  main-loop.c | 135 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 127 insertions(+), 8 deletions(-)

I assume this is complex because the pre-patch situation is overly
complex as well:

                             slirp -> fd_set
                         iohandler -> fd_set
             unix                                   windows
---------------------------------  --------------------------------------------
    before            after               before                after
--------------  -----------------  ---------------------  ---------------------
glib -> fd_set  glib -> fd_set     glib    -> GPollFD     glib    -> GPollFD
                fd_set -> GPollFD  WaitObj -> GPollFD     WaitObj -> GPollFD

SELECT          G_POLL             G_POLL (glib/WaitObj)  G_POLL (glib/WaitObj)

                                                          fd_set -> GPollFD
                                                          GPollFD -> fd_set

                                   SELECT (slirp/ioh.)    SELECT (slirp/ioh.)

(I'm ignoring the after-select / after-poll operations; they (should)
mirror the pre-select / pre-poll ones.)

No idea why the windows version has been mixing g_poll() and select().
I'd hope that this series kills select() for uniformity's sake, but the
00/10 subject and the commit msg for this patch indicate otherwise.

"main-loop.c" is full of global variables which makes it a *pain* to read.

>
> diff --git a/main-loop.c b/main-loop.c
> index d0d8fe4..f1dcd14 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -117,6 +117,8 @@ void qemu_notify_event(void)
>      aio_notify(qemu_aio_context);
>  }
>
> +static GArray *gpollfds;
> +
>  int qemu_init_main_loop(void)
>  {
>      int ret;
> @@ -133,6 +135,7 @@ int qemu_init_main_loop(void)
>          return ret;
>      }
>
> +    gpollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
>      qemu_aio_context = aio_context_new();
>      src = aio_get_g_source(qemu_aio_context);
>      g_source_attach(src, NULL);
> @@ -146,6 +149,62 @@ static GPollFD poll_fds[1024 * 2]; /* this is probably 
> overkill */
>  static int n_poll_fds;
>  static int max_priority;
>
> +/* Load rfds/wfds/xfds into gpollfds.  Will be removed a few commits later. 
> */
> +static void gpollfds_from_select(void)
> +{
> +    int fd;
> +    for (fd = 0; fd <= nfds; fd++) {

"nfds" is an example of a global variable being global ("file scope")
for no good reason. Anyway it does seem to have inclusive meaning
("highest file descriptor in all sets").

> +        int events = 0;
> +        if (FD_ISSET(fd, &rfds)) {
> +            events |= G_IO_IN | G_IO_HUP | G_IO_ERR;
> +        }

("=" would have sufficed)

> +        if (FD_ISSET(fd, &wfds)) {
> +            events |= G_IO_OUT | G_IO_ERR;
> +        }
> +        if (FD_ISSET(fd, &xfds)) {
> +            events |= G_IO_PRI;
> +        }
> +        if (events) {
> +            GPollFD pfd = {
> +                .fd = fd,
> +                .events = events,
> +            };
> +            g_array_append_val(gpollfds, pfd);
> +        }
> +    }
> +}

(I don't like "gpollfds" being global, but) this seems OK. It matches
the glib docs and also How the Mapping Should Be Done (TM).

This function corresponds to the "unix / after / fd_set -> GPollFD"
entry of the diagram, and as such it is composed with "glib -> fd_set"
(glib_select_fill()). I'll note that glib_select_fill() sets "xfds" for
G_IO_ERR but not G_IO_PRI. (According to SUSv4, in our case xfds is the
logical-or of "error pending" and "OOB/urgent pending".)

We can assume that each entry stored by g_main_context_query() will have
at least one of IN and OUT set in "events". Further assuming that glib
follows its own documentation, that implies that G_IO_ERR will be set
unconditionally (because it always accompanies each of IN and OUT).
glib_select_fill() will map that to xfds, which we then map here to
G_IO_PRI. The total effect is that G_IO_PRI is set on all file
descriptors, always, even if we only try to write.

I think ultimately support for OOB data should be killed throughout, as
a policy, including xfds & G_IO_PRI. Pending errors are returned by
read()/write()/etc just fine; see eg. libvirt commit d19149dd.

> +
> +/* Store gpollfds revents into rfds/wfds/xfds.  Will be removed a few commits
> + * later.
> + */
> +static void gpollfds_to_select(int ret)
> +{
> +    int i;
> +
> +    FD_ZERO(&rfds);
> +    FD_ZERO(&wfds);
> +    FD_ZERO(&xfds);
> +
> +    if (ret <= 0) {
> +        return;
> +    }

ie. for g_poll() timeouts, errors & signal interrupts, we clear the sets
and that's it. OK.

> +
> +    for (i = 0; i < gpollfds->len; i++) {
> +        int fd = g_array_index(gpollfds, GPollFD, i).fd;
> +        int revents = g_array_index(gpollfds, GPollFD, i).revents;
> +
> +        if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
> +            FD_SET(fd, &rfds);
> +        }
> +        if (revents & (G_IO_OUT | G_IO_ERR)) {
> +            FD_SET(fd, &wfds);
> +        }
> +        if (revents & G_IO_PRI) {
> +            FD_SET(fd, &xfds);
> +        }
> +    }
> +}

Looks good.

> +
>  #ifndef _WIN32
>  static void glib_select_fill(int *max_fd, fd_set *rfds, fd_set *wfds,
>                               fd_set *xfds, uint32_t *cur_timeout)
> @@ -212,22 +271,22 @@ static void glib_select_poll(fd_set *rfds, fd_set 
> *wfds, fd_set *xfds,
>
>  static int os_host_main_loop_wait(uint32_t timeout)
>  {
> -    struct timeval tv, *tvarg = NULL;
>      int ret;
>
>      glib_select_fill(&nfds, &rfds, &wfds, &xfds, &timeout);
>
> -    if (timeout < UINT32_MAX) {
> -        tvarg = &tv;
> -        tv.tv_sec = timeout / 1000;
> -        tv.tv_usec = (timeout % 1000) * 1000;
> -    }
> -
>      if (timeout > 0) {
>          qemu_mutex_unlock_iothread();
>      }
>
> -    ret = select(nfds + 1, &rfds, &wfds, &xfds, tvarg);
> +    /* We'll eventually drop fd_set completely.  But for now we still have
> +     * *_fill() and *_poll() functions that use rfds/wfds/xfds.
> +     */
> +    gpollfds_from_select();
> +
> +    ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout);

This exploits for the timeout parameter that (int)UINT32_MAX equals -1
on all supported platforms.

Timeout values in [ INT_MAX+1, UINT_MAX32-1 ] lose their meaning, but I
guess that's no problem in practice.

> +
> +    gpollfds_to_select(ret);
>
>      if (timeout > 0) {
>          qemu_mutex_lock_iothread();
> @@ -327,6 +386,55 @@ void qemu_fd_register(int fd)
>                     FD_CONNECT | FD_WRITE | FD_OOB);
>  }
>
> +static int pollfds_fill(GArray *pollfds, fd_set *rfds, fd_set *wfds,
> +                        fd_set *xfds)
> +{
> +    int nfds = -1;

"nfds" shadows the global "nfds". The return value of pollfds_fill() is
then assigned to the global "nfds". Not very elegant (but it's all
rooted in the existense of the global nfds).

> +    int i;
> +
> +    for (i = 0; i < pollfds->len; i++) {
> +        GPollFD *pfd = &g_array_index(pollfds, GPollFD, i);
> +        int fd = pfd->fd;
> +        int events = pfd->events;
> +        if (events & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
> +            FD_SET(fd, rfds);
> +            nfds = MAX(nfds, fd);
> +        }
> +        if (events & (G_IO_OUT | G_IO_ERR)) {
> +            FD_SET(fd, wfds);
> +            nfds = MAX(nfds, fd);
> +        }
> +        if (events & G_IO_PRI) {
> +            FD_SET(fd, xfds);
> +            nfds = MAX(nfds, fd);
> +        }
> +    }
> +    return nfds;
> +}
> +
> +static void pollfds_poll(GArray *pollfds, int nfds, fd_set *rfds,
> +                         fd_set *wfds, fd_set *xfds)
> +{

The "nfds" parameter both shadows the global variable and is unused in
the function.

> +    int i;
> +
> +    for (i = 0; i < pollfds->len; i++) {
> +        GPollFD *pfd = &g_array_index(pollfds, GPollFD, i);
> +        int fd = pfd->fd;
> +        int revents = 0;
> +
> +        if (FD_ISSET(fd, rfds)) {
> +            revents |= G_IO_IN | G_IO_HUP | G_IO_ERR;
> +        }

("=" would have sufficed)

> +        if (FD_ISSET(fd, wfds)) {
> +            revents |= G_IO_OUT | G_IO_ERR;
> +        }
> +        if (FD_ISSET(fd, xfds)) {
> +            revents |= G_IO_PRI;
> +        }
> +        pfd->revents |= revents & pfd->events;

I find this |= vs. = more confusing than the other two. "pfd->revents"
is zero here.

> +    }
> +}
> +
>  static int os_host_main_loop_wait(uint32_t timeout)
>  {
>      GMainContext *context = g_main_context_default();
> @@ -382,12 +490,22 @@ static int os_host_main_loop_wait(uint32_t timeout)
>       * improve socket latency.
>       */
>
> +    /* This back-and-forth between GPollFDs and select(2) is temporary.  
> We'll
> +     * drop it in a couple of patches, I promise :).
> +     */
> +    gpollfds_from_select();
> +    FD_ZERO(&rfds);
> +    FD_ZERO(&wfds);
> +    FD_ZERO(&xfds);
> +    nfds = pollfds_fill(gpollfds, &rfds, &wfds, &xfds);
>      if (nfds >= 0) {
>          select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0);
>          if (select_ret != 0) {
>              timeout = 0;
> +            pollfds_poll(gpollfds, nfds, &rfds, &wfds, &xfds);

This function shouldn't be invoked for "select_ret == -1" (in case
that's possible here); "revents" will end up a copy of "events":

"On failure, the objects pointed to by the readfds, writefds, and
errorfds arguments shall not be modified."

>          }
>      }
> +    gpollfds_to_select(select_ret || g_poll_ret);

I can see this logical-or mimics the return value below; however I think
it's not robust. If one of the operands is -1, and the other is
non-positive, then gpollfds_to_select() should not traverse the
GPollFDs, but it will.

... Actually, "g_poll_ret" should have absolutely no effect on
gpollfds_to_select(): "g_poll_ret" corresponds to the "poll_fds" array,
which covers the glib-internal file descriptors and the WaitObjects.
Here (= in the windows case), "gpollfds" covers only slirp & iohandler,
thus only "select_ret" should be passed in.

>
>      return select_ret || g_poll_ret;

I find this return value unfathomable (how is (-1||0) equivalent to
(1||1)?), but it's out of scope for now.

>  }
> @@ -403,6 +521,7 @@ int main_loop_wait(int nonblocking)
>      }
>
>      /* poll any events */
> +    g_array_set_size(gpollfds, 0); /* reset for new iteration */
>      /* XXX: separate device handlers from system ones */
>      nfds = -1;
>      FD_ZERO(&rfds);

I can't decide (yet) if any of the above is important; probably not.
>From a quick look at the tree at the series' end, most of it seems to be
gone by then. I'll let you decide.

Reviewed-by: Laszlo Ersek <address@hidden>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]