qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi


From: Ouyang, Changchun
Subject: Re: [Qemu-devel] [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi queue support
Date: Mon, 31 Aug 2015 08:29:39 +0000


> -----Original Message-----
> From: Michael S. Tsirkin [mailto:address@hidden
> Sent: Sunday, August 30, 2015 2:16 PM
> To: Ouyang, Changchun
> Cc: address@hidden; address@hidden;
> address@hidden; address@hidden;
> address@hidden; Long, Thomas; Liu, Yuanhan
> Subject: Re: [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi queue
> support
> 
> On Fri, Aug 28, 2015 at 01:53:39AM +0000, Ouyang, Changchun wrote:
> > Hi Michael,
> >
> > Thanks for your response!
> >
> > > -----Original Message-----
> > > From: address@hidden [mailto:snabb-
> > > address@hidden On Behalf Of Michael S. Tsirkin
> > > Sent: Thursday, August 27, 2015 9:05 PM
> > > To: Ouyang, Changchun
> > > Cc: address@hidden; address@hidden;
> > > address@hidden; address@hidden;
> > > address@hidden; Long, Thomas; Liu, Yuanhan
> > > Subject: Re: [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi
> > > queue support
> > >
> > > On Tue, Aug 25, 2015 at 03:25:54AM +0000, Ouyang, Changchun wrote:
> > > > Hi Michael,
> > > >
> > > > > -----Original Message-----
> > > > > From: address@hidden [mailto:snabb-
> > > > > address@hidden On Behalf Of Michael S. Tsirkin
> > > > > Sent: Thursday, August 13, 2015 5:19 PM
> > > > > To: Ouyang, Changchun
> > > > > Cc: address@hidden; address@hidden;
> > > > > address@hidden; address@hidden;
> > > > > address@hidden; Long, Thomas
> > > > > Subject: [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi
> > > > > queue support
> > > > >
> > > > > On Wed, Aug 12, 2015 at 02:25:41PM +0800, Ouyang Changchun wrote:
> > > > > > Based on patch by Nikolay Nikolaev:
> > > > > > Vhost-user will implement the multi queue support in a similar
> > > > > > way to what vhost already has - a separate thread for each queue.
> > > > > > To enable the multi queue functionality - a new command line
> > > > > > parameter "queues" is introduced for the vhost-user netdev.
> > > > > >
> > > > > > The RESET_OWNER change is based on commit:
> > > > > >    294ce717e0f212ed0763307f3eab72b4a1bdf4d0
> > > > > > If it is reverted, the patch need update for it accordingly.
> > > > > >
> > > > > > Signed-off-by: Nikolay Nikolaev
> > > > > > <address@hidden>
> > > > > > Signed-off-by: Changchun Ouyang <address@hidden>
> > > > > > ---
> > > > > > Changes since v5:
> > > > > >  - fix the message descption for VHOST_RESET_OWNER in
> > > > > > vhost-user txt
> > > > > >
> > > > > > Changes since v4:
> > > > > >  - remove the unnecessary trailing '\n'
> > > > > >
> > > > > > Changes since v3:
> > > > > >  - fix one typo and wrap one long line
> > > > > >
> > > > > > Changes since v2:
> > > > > >  - fix vq index issue for set_vring_call
> > > > > >    When it is the case of VHOST_SET_VRING_CALL, The vq_index
> > > > > > is not
> > > > > initialized before it is used,
> > > > > >    thus it could be a random value. The random value leads to
> > > > > > crash in vhost
> > > > > after passing down
> > > > > >    to vhost, as vhost use this random value to index an array index.
> > > > > >  - fix the typo in the doc and description
> > > > > >  - address vq index for reset_owner
> > > > > >
> > > > > > Changes since v1:
> > > > > >  - use s->nc.info_str when bringing up/down the backend
> > > > > >
> > > > > >  docs/specs/vhost-user.txt |  7 ++++++-
> > > > > >  hw/net/vhost_net.c        |  3 ++-
> > > > > >  hw/virtio/vhost-user.c    | 11 ++++++++++-
> > > > > >  net/vhost-user.c          | 37 ++++++++++++++++++++++++------------
> -
> > > > > >  qapi-schema.json          |  6 +++++-
> > > > > >  qemu-options.hx           |  5 +++--
> > > > > >  6 files changed, 50 insertions(+), 19 deletions(-)
> > > > > >
> > > > > > diff --git a/docs/specs/vhost-user.txt
> > > > > > b/docs/specs/vhost-user.txt index 70da3b1..9390f89 100644
> > > > > > --- a/docs/specs/vhost-user.txt
> > > > > > +++ b/docs/specs/vhost-user.txt
> > > > > > @@ -135,6 +135,11 @@ As older slaves don't support negotiating
> > > > > > protocol features,  a feature bit was dedicated for this purpose:
> > > > > >  #define VHOST_USER_F_PROTOCOL_FEATURES 30
> > > > > >
> > > > > > +Multi queue support
> > > > > > +-------------------
> > > > > > +The protocol supports multiple queues by setting all index
> > > > > > +fields in the sent messages to a properly calculated value.
> > > > > > +
> > > > > >  Message types
> > > > > >  -------------
> > > > > >
> > > > > > @@ -198,7 +203,7 @@ Message types
> > > > > >
> > > > > >        Id: 4
> > > > > >        Equivalent ioctl: VHOST_RESET_OWNER
> > > > > > -      Master payload: N/A
> > > > > > +      Master payload: vring state description
> > > > > >
> > > > > >        Issued when a new connection is about to be closed. The
> > > > > > Master
> > > will no
> > > > > >        longer own this connection (and will usually close it).
> > > > >
> > > > > This is an interface change, isn't it?
> > > > > We can't make it unconditionally, need to make it dependent on a
> > > > > protocol flag.
> > >
> > >
> > > Pls remember to fix this one.
> > >
> > > > >
> > > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index
> > > > > > 1f25cb3..9cd6c05 100644
> > > > > > --- a/hw/net/vhost_net.c
> > > > > > +++ b/hw/net/vhost_net.c
> > > > > > @@ -159,6 +159,7 @@ struct vhost_net
> > > > > > *vhost_net_init(VhostNetOptions
> > > > > > *options)
> > > > > >
> > > > > >      net->dev.nvqs = 2;
> > > > > >      net->dev.vqs = net->vqs;
> > > > > > +    net->dev.vq_index = net->nc->queue_index;
> > > > > >
> > > > > >      r = vhost_dev_init(&net->dev, options->opaque,
> > > > > >                         options->backend_type,
> > > > > > options->force); @@
> > > > > > -269,7 +270,7 @@ static void vhost_net_stop_one(struct
> > > > > > vhost_net
> > > *net,
> > > > > >          for (file.index = 0; file.index < net->dev.nvqs; 
> > > > > > ++file.index) {
> > > > > >              const VhostOps *vhost_ops = net->dev.vhost_ops;
> > > > > >              int r = vhost_ops->vhost_call(&net->dev,
> VHOST_RESET_OWNER,
> > > > > > -                                          NULL);
> > > > > > +                                          &file);
> > > > > >              assert(r >= 0);
> > > > > >          }
> > > > > >      }
> > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > > > index 27ba035..fb11d4c 100644
> > > > > > --- a/hw/virtio/vhost-user.c
> > > > > > +++ b/hw/virtio/vhost-user.c
> > > > > > @@ -219,7 +219,12 @@ static int vhost_user_call(struct
> > > > > > vhost_dev *dev,
> > > > > unsigned long int request,
> > > > > >          break;
> > > > > >
> > > > > >      case VHOST_USER_SET_OWNER:
> > > > > > +        break;
> > > > > > +
> > > > > >      case VHOST_USER_RESET_OWNER:
> > > > > > +        memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> > > > > > +        msg.state.index += dev->vq_index;
> > > > > > +        msg.size = sizeof(m.state);
> > > > > >          break;
> > > > > >
> > > > > >      case VHOST_USER_SET_MEM_TABLE:
> > > > > > @@ -262,17 +267,20 @@ static int vhost_user_call(struct
> > > > > > vhost_dev *dev,
> > > > > unsigned long int request,
> > > > > >      case VHOST_USER_SET_VRING_NUM:
> > > > > >      case VHOST_USER_SET_VRING_BASE:
> > > > > >          memcpy(&msg.state, arg, sizeof(struct
> > > > > > vhost_vring_state));
> > > > > > +        msg.state.index += dev->vq_index;
> > > > > >          msg.size = sizeof(m.state);
> > > > > >          break;
> > > > > >
> > > > > >      case VHOST_USER_GET_VRING_BASE:
> > > > > >          memcpy(&msg.state, arg, sizeof(struct
> > > > > > vhost_vring_state));
> > > > > > +        msg.state.index += dev->vq_index;
> > > > > >          msg.size = sizeof(m.state);
> > > > > >          need_reply = 1;
> > > > > >          break;
> > > > > >
> > > > > >      case VHOST_USER_SET_VRING_ADDR:
> > > > > >          memcpy(&msg.addr, arg, sizeof(struct
> > > > > > vhost_vring_addr));
> > > > > > +        msg.addr.index += dev->vq_index;
> > > > > >          msg.size = sizeof(m.addr);
> > > > > >          break;
> > > > > >
> > > > > > @@ -280,7 +288,7 @@ static int vhost_user_call(struct
> > > > > > vhost_dev *dev,
> > > > > unsigned long int request,
> > > > > >      case VHOST_USER_SET_VRING_CALL:
> > > > > >      case VHOST_USER_SET_VRING_ERR:
> > > > > >          file = arg;
> > > > > > -        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
> > > > > > +        msg.u64 = (file->index + dev->vq_index) &
> > > > > > + VHOST_USER_VRING_IDX_MASK;
> > > > > >          msg.size = sizeof(m.u64);
> > > > > >          if (ioeventfd_enabled() && file->fd > 0) {
> > > > > >              fds[fd_num++] = file->fd; @@ -322,6 +330,7 @@
> > > > > > static int vhost_user_call(struct vhost_dev *dev, unsigned long int
> request,
> > > > > >                  error_report("Received bad msg size.");
> > > > > >                  return -1;
> > > > > >              }
> > > > > > +            msg.state.index -= dev->vq_index;
> > > > > >              memcpy(arg, &msg.state, sizeof(struct 
> > > > > > vhost_vring_state));
> > > > > >              break;
> > > > > >          default:
> > > > > > diff --git a/net/vhost-user.c b/net/vhost-user.c index
> > > > > > 1d86a2b..904d8af 100644
> > > > > > --- a/net/vhost-user.c
> > > > > > +++ b/net/vhost-user.c
> > > > > > @@ -121,35 +121,39 @@ static void net_vhost_user_event(void
> > > > > > *opaque,
> > > > > int event)
> > > > > >      case CHR_EVENT_OPENED:
> > > > > >          vhost_user_start(s);
> > > > > >          net_vhost_link_down(s, false);
> > > > > > -        error_report("chardev \"%s\" went up", s->chr->label);
> > > > > > +        error_report("chardev \"%s\" went up",
> > > > > > + s->nc.info_str);
> > > > > >          break;
> > > > > >      case CHR_EVENT_CLOSED:
> > > > > >          net_vhost_link_down(s, true);
> > > > > >          vhost_user_stop(s);
> > > > > > -        error_report("chardev \"%s\" went down", s->chr->label);
> > > > > > +        error_report("chardev \"%s\" went down",
> > > > > > + s->nc.info_str);
> > > > > >          break;
> > > > > >      }
> > > > > >  }
> > >
> > > BTW this seems pretty hacky: you get multiple messages when one
> > > client connects. Why add multiple event listeners to the same chat device?
> > >
> > >
> > > > > >
> > > > > >  static int net_vhost_user_init(NetClientState *peer, const char
> *device,
> > > > > > -                               const char *name, CharDriverState 
> > > > > > *chr)
> > > > > > +                               const char *name, CharDriverState 
> > > > > > *chr,
> > > > > > +                               uint32_t queues)
> > > > > >  {
> > > > > >      NetClientState *nc;
> > > > > >      VhostUserState *s;
> > > > > > +    int i;
> > > > > >
> > > > > > -    nc = qemu_new_net_client(&net_vhost_user_info, peer, device,
> > > > > name);
> > > > > > +    for (i = 0; i < queues; i++) {
> > > > > > +        nc = qemu_new_net_client(&net_vhost_user_info, peer,
> > > > > > + device, name);
> > > > > >
> > > > > > -    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to 
> > > > > > %s",
> > > > > > -             chr->label);
> > > > > > +        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d
> to %s",
> > > > > > +                 i, chr->label);
> > > > > >
> > > > > > -    s = DO_UPCAST(VhostUserState, nc, nc);
> > > > > > +        s = DO_UPCAST(VhostUserState, nc, nc);
> > > > > >
> > > > > > -    /* We don't provide a receive callback */
> > > > > > -    s->nc.receive_disabled = 1;
> > > > > > -    s->chr = chr;
> > > > > > -
> > > > > > -    qemu_chr_add_handlers(s->chr, NULL, NULL,
> > > net_vhost_user_event,
> > > > > s);
> > > > > > +        /* We don't provide a receive callback */
> > > > > > +        s->nc.receive_disabled = 1;
> > > > > > +        s->chr = chr;
> > > > > > +        s->nc.queue_index = i;
> > > > > >
> > > > > > +        qemu_chr_add_handlers(s->chr, NULL, NULL,
> > > > > > + net_vhost_user_event,
> > > > > s);
> > > > > > +    }
> > > > > >      return 0;
> > > > > >  }
> > > > > >
> > > > > > @@ -225,6 +229,7 @@ static int net_vhost_check_net(QemuOpts
> > > *opts,
> > > > > > void *opaque)
> > > > >
> > > > >
> > > > > There are two problems here:
> > > > >
> > > > > 1. we don't really know that the backend
> > > > >    is able to support the requested number of queues.
> > > > >    If not, everything will fail, silently.
> > > > >    A new message to query the # of queues could help, though
> > > > >    I'm not sure what can be done on failure. Fail connection?
> > > > >
> > > > > 2. each message (e.g. set memory table) is sent multiple times,
> > > > >    on the same socket.
> > > > >
> > > > I think it is tough to resolve these 2 comments, as the current
> > > > message is either vhost-dev based or virt-queue based, The
> > > > multiple
> > > queues(pair) feature use multiple vhost-devs to implement itself.
> > > > For #1
> > > > So the queue number is something should be seen in the upper level
> > > > of
> > > vhost-dev rather than inside the vhost-dev.
> > > > For each vhost-net, there are 2 virt-queues, one is for Rx the other is
> for Tx.
> > > > introduce the virt-queue pair number into the vhost-dev? But I
> > > > don't think it is good, as for each vhost-dev, there is only one virt-
> queue pair.
> > > >
> > > > Where should I put the virt-queue pair number to? I don't get the
> > > > perfect
> > > answer till now. Any suggestion is welcome.
> > > >
> > > > Could we assume the vhost backend has the ability to create enough
> > > > virt-queue pair(e.g. 0x8000 is the max) if qemu require Vhost
> > > > backend to do it. If it is correct, we don't need get virt-queue
> > > > pair number from
> > > vhost backend, as vhost backend can Create all virt-queue pair
> > > required by qemu.
> > > > The virtio frontend(on guest) has the flexibility to enable which
> > > > virt-queue
> > > according to its own capability, qemu can do it by using
> > > > Set_vring_flag message to notify vhost backend.
> > >
> > > I'm reluctant to agree to this. Implementations tend to get this wrong, 
> > > e.g.
> > > they would only test with 2 queues and assume everything is OK.
> > > With an explicit message, this seems more robust.
> > >
> > > Why is it so hard to implement?  User specifies queues=X.
> > > Can't we simply validate that backend supports this # of queues?
> >
> > Yes exactly in the qemu, we use qeueus=X to specify the virt-queue pair
> number.
> > In my previous comments, I am talking about how to resolve such scenario:
> > vhost backend specify the queues=Y(Y != X, e.g. X = 8, Y = 4), how
> > vhost backend tell qemu that it only has the capability of max
> > virt-queue pair is 4, and qemu need update its max queue number
> according to its own parameter and the max value from vhost back end.
> >
> > Do you think whether this scenario makes sense or not?
> 
> No, I think that if we can't support the number requested by user, QEMU can
> simply exit.

Well, both values are input from user, for some unknown reason, they are 
different.
When startup the vhost backend, user specify 4 queues, like:
Vhost-user ...  -queues 4      # Y=4
When startup the guest, user specify 8 queues, like:
Qemu-kvm ... -queues 8      # X=8

Vhost backend can't support the queues number as much as the guest/qemu 
requires,
I think you mean we should let qemu exit in such a case, right?   

Then how about the other case:
When startup the vhost backend, user specify 8 queues, like:
Vhost-user ...  -queues 8      # Y=8
When startup the guest, user specify 4 queues, like:
Qemu-kvm ... -queues 4      # X=4

Vhost backend CAN support the queues number as much as the guest/qemu requires,
And qemu don't want enable all of them(8 in the case), just part of them(only 4 
from 8).
Will qemu exit or go forward(of course with 4 queues)? 

> 
> > If yes, then we should consider extend new message to allow vhost
> > backend communicate Its virt-queue number to qemu.
> 
> Yes, conditional on the mq protocol flag.
> 
> > More than that, either we need change codes to allow each net_client
> > has more than
> > 1 virt-queue pair(==2 virt-queue, one for Rx, the other for Tx), Or
> > another better solution for that?
> 
> I'm not sure why is that required for the above.
> 
> > >
> > >
> > > > For #2
> > > > The memory table message is also vhost-dev based, it wouldn't hurt
> > > > we send it a few times, vhost backend could Keep it vhost-dev
> > > > based too, or keep it once(keep it when first time and ignore in
> > > > rest messages from the same connected-fd) Any other good
> > > > suggestion is welcome too
> > > > :-)
> > >
> > > Add code in vhost-user to skip sending the useless messages?  Yes
> > > they seem harmless but implementations tend to develop dependencies
> > > on such bugs, then we get to maintain them forever.
> >
> > Considering we use multiple vhost_net(multiple net_client) to fulfill
> multiple virt-queue pairs.
> > So only the first virt-queue pair will send the set-memory-table, others
> won't do it.
> > Is that ok? so apparently each net client will have difference in message
> count.
> > The first net client will have the message of set memory table, others have
> not.
> >
> > Thanks
> > Changchun
> 
> But all these clients use the same unix domain socket, correct?
Yes, correct

> If so, I don't think it matters that internally in qemu they use different net
> clients. From the external point of view, it looks like a single message is 
> sent
> on a single socket.
> 
> > >
> > > > >
> > > > >
> > > > > >  int net_init_vhost_user(const NetClientOptions *opts, const
> > > > > > char
> > > *name,
> > > > > >                          NetClientState *peer)  {
> > > > > > +    uint32_t queues;
> > > > > >      const NetdevVhostUserOptions *vhost_user_opts;
> > > > > >      CharDriverState *chr;
> > > > > >
> > > > > > @@ -243,6 +248,12 @@ int net_init_vhost_user(const
> > > > > > NetClientOptions
> > > > > *opts, const char *name,
> > > > > >          return -1;
> > > > > >      }
> > > > > >
> > > > > > +    /* number of queues for multiqueue */
> > > > > > +    if (vhost_user_opts->has_queues) {
> > > > > > +        queues = vhost_user_opts->queues;
> > > > > > +    } else {
> > > > > > +        queues = 1;
> > > > > > +    }
> > > > > >
> > > > > > -    return net_vhost_user_init(peer, "vhost_user", name, chr);
> > > > > > +    return net_vhost_user_init(peer, "vhost_user", name, chr,
> > > > > > + queues);
> > > > > >  }
> > > > > > diff --git a/qapi-schema.json b/qapi-schema.json index
> > > > > > f97ffa1..51e40ce 100644
> > > > > > --- a/qapi-schema.json
> > > > > > +++ b/qapi-schema.json
> > > > > > @@ -2444,12 +2444,16 @@
> > > > > >  #
> > > > > >  # @vhostforce: #optional vhost on for non-MSIX virtio guests
> (default:
> > > > > false).
> > > > > >  #
> > > > > > +# @queues: #optional number of queues to be created for
> > > > > > +multiqueue
> > > > > vhost-user
> > > > > > +#          (default: 1) (Since 2.5)
> > > > > > +#
> > > > > >  # Since 2.1
> > > > > >  ##
> > > > > >  { 'struct': 'NetdevVhostUserOptions',
> > > > > >    'data': {
> > > > > >      'chardev':        'str',
> > > > > > -    '*vhostforce':    'bool' } }
> > > > > > +    '*vhostforce':    'bool',
> > > > > > +    '*queues':        'uint32' } }
> > > > > >
> > > > > >  ##
> > > > > >  # @NetClientOptions
> > > > > > diff --git a/qemu-options.hx b/qemu-options.hx index
> > > > > > ec356f6..dad035e
> > > > > > 100644
> > > > > > --- a/qemu-options.hx
> > > > > > +++ b/qemu-options.hx
> > > > > > @@ -1942,13 +1942,14 @@ The hubport netdev lets you connect a
> > > > > > NIC to a QEMU "vlan" instead of a single  netdev.  @code{-net}
> > > > > > and @code{-device} with parameter @option{vlan} create the
> > > > > > required hub
> > > > > automatically.
> > > > > >
> > > > > > address@hidden -netdev vhost-user,address@hidden,vhostforce=on|off]
> > > > > > address@hidden -netdev
> > > > > > +vhost-user,address@hidden,vhostforce=on|off][,queues=n]
> > > > > >
> > > > > >  Establish a vhost-user netdev, backed by a chardev @var{id}.
> > > > > > The chardev should  be a unix domain socket backed one. The
> > > > > > vhost-user uses a specifically defined  protocol to pass vhost
> > > > > > ioctl replacement messages to an application on the other  end
> > > > > > of the socket. On non-MSIX guests, the feature can be forced
> > > > > > with -
> > > @var{vhostforce}.
> > > > > > address@hidden Use 'address@hidden' to specify the number
> > > > > > +of queues to be created for multiqueue vhost-user.
> > > > > >
> > > > > >  Example:
> > > > > >  @example
> > > > > > --
> > > > > > 1.8.4.2
> > > > > >
> > > > >
> > > > > --
> > > > > You received this message because you are subscribed to the
> > > > > Google Groups "Snabb Switch development" group.
> > > > > To unsubscribe from this group and stop receiving emails from
> > > > > it, send an email to address@hidden
> > > > > To post to this group, send an email to snabb-
> address@hidden
> > > > > Visit this group at http://groups.google.com/group/snabb-devel.
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Snabb Switch development" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email to address@hidden
> > > To post to this group, send an email to address@hidden
> > > Visit this group at http://groups.google.com/group/snabb-devel.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]