qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 05/10] slirp: switch to GPollFD


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH v3 05/10] slirp: switch to GPollFD
Date: Wed, 06 Feb 2013 01:59:25 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130108 Thunderbird/10.0.12

comments in-line

On 02/04/13 13:12, Stefan Hajnoczi wrote:
> Slirp uses rfds/wfds/xfds more extensively than other QEMU components.
> 
> The rarely-used out-of-band TCP data feature is used.  That means we
> need the full table of select(2) to g_poll(3) events:
> 
>   rfds -> G_IO_IN | G_IO_HUP | G_IO_ERR
>   wfds -> G_IO_OUT | G_IO_ERR
>   xfds -> G_IO_PRI
> 
> I came up with this table by looking at Linux fs/select.c which maps
> select(2) to poll(2) internally.
> 
> Another detail to watch out for are the global variables that reference
> rfds/wfds/xfds during slirp_select_poll().  sofcantrcvmore() and
> sofcantsendmore() use these globals to clear fd_set bits.  When
> sofcantrcvmore() is called, the wfds bit is cleared so that the write
> handler will no longer be run for this iteration of the event loop.
> 
> This actually seems buggy to me since TCP connections can be half-closed
> and we'd still want to handle data in half-duplex fashion.  I think the
> real intention is to avoid running the read/write handler when the
> socket has been fully closed.  This is indicated with the SS_NOFDREF
> state bit so we now check for it before invoking the TCP write handler.

I agree with how you interpret SS_NOFDREF.

> Note that UDP/ICMP code paths don't care because they are
> connectionless.

I guess they rather don't care because they are unreliable -- it doesn't
matter if we lose an outgoing packet somewhere in the tubes or right at
our own full socket buffer, so we don't care about write readiness. (But
I may be misunderstanding it all.)


>                  if (so->so_state & SS_NOFDREF || so->s == -1) {
>                      continue;
>                  }
> @@ -475,15 +490,15 @@ void slirp_select_poll(fd_set *readfds, fd_set 
> *writefds, fd_set *xfds,
>                  /*
>                   * Check for URG data
>                   * This will soread as well, so no need to
> -                 * test for readfds below if this succeeds
> +                 * test for G_IO_IN below if this succeeds
>                   */
> -                if (FD_ISSET(so->s, xfds)) {
> +                if (revents & G_IO_PRI) {
>                      sorecvoob(so);
>                  }

According to SUSv3+, a bit/fd set in xfds means "socket level error
pending, or socket-specific exceptional condition". In other words,
G_IO_ERR|G_IO_PRI. Therefore this change looks a bit suspicious, because
a pending socket-level error would trigger this branch before the patch,
but trickle down after the patch.

To test this, I've written the attached small test program. It hangs on
RHEL-6.3 (2.6.32-279.19.1.el6.x86_64), which means that
- RHEL-6.3 select() is not SUSv3/SUSv4-conformant, and
- the mapping you mention in the commit message, and implement here, is
correct in practice.

(As a sanity check for the test program, if you pass in the fdset as
"writefds", the pending error is signalled & fetched OK.)

>                  /*
>                   * Check sockets for reading
>                   */
> -                else if (FD_ISSET(so->s, readfds)) {
> +                else if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) {
>                      /*
>                       * Check for incoming connections
>                       */
> @@ -502,7 +517,8 @@ void slirp_select_poll(fd_set *readfds, fd_set *writefds, 
> fd_set *xfds,
>                  /*
>                   * Check sockets for writing
>                   */
> -                if (FD_ISSET(so->s, writefds)) {
> +                if (!(so->so_state & SS_NOFDREF) &&
> +                        (revents & (G_IO_OUT | G_IO_ERR))) {
>                      /*
>                       * Check for non-blocking, still-connecting sockets
>                       */

Looking at nothing but the loop, SS_NOFDREF appears impossible here;
however soread() --> sofcantrcvmore() can set it after we pass the
initial check.

Ugh. This is where sofcantrcvmore() / sofcantsendmore() show their split
personalities.

The second part of each seems OK. They both try to set the
SS_FCANTxxxxMORE bit corresponding to their names (meaning, that channel
is shut down). If they find that the *other* direction has been already
shut down, they set SS_NOFDREF instead. Fine.

The shutdown() direction is also correct in each, SHUT_RD==0 in
sofcantrcvmore(), SHUT_WR==1 in the other. Fine again.

What is baffling is why the other direction's fd_set(s) is (are) pruned!
git-blame doesn't help, it dates back to slirp's initial commit in qemu.
I'm intrigued.

Anyway, if we'd like to remain bug-compatible, then

                if (!(so->so_state & (SS_NOFDREF | SS_FCANTRCVMORE)) &&
                        (revents & (G_IO_OUT | G_IO_ERR))) {

would be necessary here. This drags the bug into sunlight (why shouldn't
we try to send if we read EOF?), but I believe this is what covers both
of the "so_state" branches in sofcantrcvmore().

... Except, of course, this would permanently kill our ability to send
back on the half-closed socket. sofcantrcvmore() clears the write-set
bit only for the current iteration; the next iteration will set it up
again. Whereas SS_FCANTRCVMORE is permanent, once set.

Seems like we can't copy the original behavior here and have no choice
but fixing this slirp bug. Shudder!

> @@ -588,9 +604,18 @@ void slirp_select_poll(fd_set *readfds, fd_set 
> *writefds, fd_set *xfds,
>               */
>              for (so = slirp->udb.so_next; so != &slirp->udb;
>                      so = so_next) {
> +                int revents;
> +
>                  so_next = so->so_next;
>  
> -                if (so->s != -1 && FD_ISSET(so->s, readfds)) {
> +                revents = 0;
> +                if (so->pollfds_idx != -1) {
> +                    revents = g_array_index(pollfds, GPollFD,
> +                            so->pollfds_idx).revents;
> +                }
> +
> +                if (so->s != -1 &&
> +                    (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR))) {
>                      sorecvfrom(so);
>                  }
>              }

I think (so->s != -1) is constant true here *if* the "revents"
expression evals to non-zero.
- We know that "revents" has at least one interesting bit set,
- that means (pollfds_idx != -1)
- that means -- see slirp_select_fill() conversion -- that
  (pfd.fd == so->s && so->s != -1)

But it doesn't hurt.


> @@ -600,9 +625,18 @@ void slirp_select_poll(fd_set *readfds, fd_set 
> *writefds, fd_set *xfds,
>               */
>              for (so = slirp->icmp.so_next; so != &slirp->icmp;
>                      so = so_next) {
> -                so_next = so->so_next;
> +                    int revents;
> +
> +                    so_next = so->so_next;
> +
> +                    revents = 0;
> +                    if (so->pollfds_idx != -1) {
> +                        revents = g_array_index(pollfds, GPollFD,
> +                                                so->pollfds_idx).revents;
> +                    }
>  
> -                if (so->s != -1 && FD_ISSET(so->s, readfds)) {
> +                    if (so->s != -1 &&
> +                        (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR))) {
>                      icmp_receive(so);
>                  }
>              }

Same here.

> @@ -610,15 +644,6 @@ void slirp_select_poll(fd_set *readfds, fd_set 
> *writefds, fd_set *xfds,
>  
>          if_start(slirp);
>      }
> -
> -    /* clear global file descriptor sets.
> -     * these reside on the stack in vl.c
> -     * so they're unusable if we're not in
> -     * slirp_select_fill or slirp_select_poll.
> -     */
> -    global_readfds = NULL;
> -    global_writefds = NULL;
> -    global_xfds = NULL;
>  }
>  
>  static void arp_input(Slirp *slirp, const uint8_t *pkt, int pkt_len)
> diff --git a/slirp/socket.c b/slirp/socket.c
> index 77b0c98..a7ab933 100644
> --- a/slirp/socket.c
> +++ b/slirp/socket.c
> @@ -680,9 +680,6 @@ sofcantrcvmore(struct socket *so)
>  {
>       if ((so->so_state & SS_NOFDREF) == 0) {
>               shutdown(so->s,0);
> -             if(global_writefds) {
> -               FD_CLR(so->s,global_writefds);
> -             }

Yep.

>       }
>       so->so_state &= ~(SS_ISFCONNECTING);
>       if (so->so_state & SS_FCANTSENDMORE) {
> @@ -698,12 +695,6 @@ sofcantsendmore(struct socket *so)
>  {
>       if ((so->so_state & SS_NOFDREF) == 0) {
>              shutdown(so->s,1);           /* send FIN to fhost */
> -            if (global_readfds) {
> -                FD_CLR(so->s,global_readfds);
> -            }
> -            if (global_xfds) {
> -                FD_CLR(so->s,global_xfds);
> -            }

Right.

>       }
>       so->so_state &= ~(SS_ISFCONNECTING);
>       if (so->so_state & SS_FCANTRCVMORE) {
> diff --git a/slirp/socket.h b/slirp/socket.h
> index 857b0da..57e0407 100644
> --- a/slirp/socket.h
> +++ b/slirp/socket.h
> @@ -20,6 +20,8 @@ struct socket {
>  
>    int s;                           /* The actual socket */
>  
> +  int pollfds_idx;                 /* GPollFD GArray index */
> +
>    Slirp *slirp;                         /* managing slirp instance */
>  
>                       /* XXX union these with not-yet-used sbuf params */

(I generate hunks for header files before those for C files in my
patches, see the "-O<orderfile>" option of git-format-patch.)

> diff --git a/stubs/slirp.c b/stubs/slirp.c
> index 9a3309a..f1fc833 100644
> --- a/stubs/slirp.c
> +++ b/stubs/slirp.c
> @@ -5,13 +5,11 @@ void slirp_update_timeout(uint32_t *timeout)
>  {
>  }
>  
> -void slirp_select_fill(int *pnfds, fd_set *readfds,
> -                       fd_set *writefds, fd_set *xfds)
> +void slirp_pollfds_fill(GArray *pollfds)
>  {
>  }
>  
> -void slirp_select_poll(fd_set *readfds, fd_set *writefds,
> -                       fd_set *xfds, int select_error)
> +void slirp_pollfds_poll(GArray *pollfds, int select_error)
>  {
>  }
>  

Reviewed-by: Laszlo Ersek <address@hidden>

Attachment: xfds_test.c
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]