[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
From: |
Neo Jia |
Subject: |
Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration |
Date: |
Wed, 28 Sep 2016 13:47:34 -0700 |
User-agent: |
Mutt/1.6.2 (2016-07-01) |
On Wed, Sep 28, 2016 at 04:31:25PM -0400, Laine Stump wrote:
> On 09/28/2016 03:59 PM, Neo Jia wrote:
> > On Wed, Sep 28, 2016 at 07:45:38PM +0000, Tian, Kevin wrote:
> > > > From: Neo Jia [mailto:address@hidden
> > > > Sent: Thursday, September 29, 2016 3:23 AM
> > > >
> > > > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > > > Kirti Wankhede <address@hidden> wrote:
> > > > > >
> > > > > > > > > > > > My concern is that a type id seems arbitrary but we're
> > > > > > > > > > > > specifying that
> > > > > > > > > > > > it be unique. We already have something unique, the
> > > > > > > > > > > > name. So why try
> > > > > > > > > > > > to make the type id unique as well? A vendor can
> > > > > > > > > > > > accidentally create
> > > > > > > > > > > > their vendor driver so that a given name means
> > > > > > > > > > > > something very
> > > > > > > > > > > > specific. On the other hand they need to be extremely
> > > > > > > > > > > > deliberate to
> > > > > > > > > > > > coordinate that a type id means a unique thing across
> > > > > > > > > > > > all their product
> > > > > > > > > > > > lines.
> > > > > > > > > > > >
> > > > > > > > > > > Let me clarify, type id should be unique in the list of
> > > > > > > > > > > mdev_supported_types. You can't have 2 directories in
> > > > > > > > > > > with same name.
> > > > > > > > > > Of course, but does that mean it's only unique to the
> > > > > > > > > > machine I'm
> > > > > > > > > > currently running on? Let's say I have a Tesla P100 on my
> > > > > > > > > > system and
> > > > > > > > > > type-id 11 is named "GRID-M60-0B". At some point in the
> > > > > > > > > > future I
> > > > > > > > > > replace the Tesla P100 with a Q1000 (made up). Is type-id
> > > > > > > > > > 11 on that
> > > > > > > > > > new card still going to be a "GRID-M60-0B"? If not then
> > > > > > > > > > we've based
> > > > > > > > > > our XML on the wrong attribute. If the new device does not
> > > > > > > > > > support
> > > > > > > > > > "GRID-M60-0B" then we should generate an error, not simply
> > > > > > > > > > initialize
> > > > > > > > > > whatever type-id 11 happens to be on this new card.
> > > > > > > > > >
> > > > > > > > > If there are 2 M60 in the system then you would find '11'
> > > > > > > > > type directory
> > > > > > > > > in mdev_supported_types of both M60. If you have P100, '11'
> > > > > > > > > type would
> > > > > > > > > not be there in its mdev_supported_types, it will have
> > > > > > > > > different types.
> > > > > > > > >
> > > > > > > > > For example, if you replace M60 with P100, but XML is not
> > > > > > > > > updated. XML
> > > > > > > > > have type '11'. When libvirt would try to create mdev device,
> > > > > > > > > libvirt
> > > > > > > > > would have to find 'create' file in sysfs in following
> > > > > > > > > directory format:
> > > > > > > > >
> > > > > > > > > --- mdev_supported_types
> > > > > > > > > |-- 11
> > > > > > > > > | |-- create
> > > > > > > > >
> > > > > > > > > but now for P100, '11' directory is not there, so libvirt
> > > > > > > > > should throw
> > > > > > > > > error on not able to find '11' directory.
> > > > > > > > This really seems like an accident waiting to happen. What
> > > > > > > > happens
> > > > > > > > when the user replaces their M60 with an Intel XYZ device that
> > > > > > > > happens
> > > > > > > > to expose a type 11 mdev class gpu device? How is libvirt
> > > > > > > > supposed to
> > > > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > > > INTEL-IGD-XYZ? Doesn't basing the XML entry on the name and
> > > > > > > > removing
> > > > > > > > yet another arbitrary requirement that we have some sort of
> > > > > > > > globally
> > > > > > > > unique type-id database make a lot of sense? The same issue
> > > > > > > > applies
> > > > > > > > for simple debug-ability, if I'm reviewing the XML for a domain
> > > > > > > > and the
> > > > > > > > name is the primary index for the mdev device, I know what it
> > > > > > > > is.
> > > > > > > > Seeing type-id='11' is meaningless.
> > > > > > > >
> > > > > > > Let me clarify again, type '11' is a string that vendor driver
> > > > > > > would
> > > > > > > define (see my previous reply below) it could be "11" or
> > > > > > > "GRID-M60-0B".
> > > > > > > If 2 vendors used same string we can't control that. right?
> > > > > > >
> > > > > > >
> > > > > > > > > > > Lets remove 'id' from type id in XML if that is the
> > > > > > > > > > > concern. Supported
> > > > > > > > > > > types is going to be defined by vendor driver, so let
> > > > > > > > > > > vendor driver
> > > > > > > > > > > decide what to use for directory name and same should be
> > > > > > > > > > > used in device
> > > > > > > > > > > xml file, it could be '11' or "GRID M60-0B":
> > > > > > > > > > >
> > > > > > > > > > > <device>
> > > > > > > > > > > <name>my-vgpu</name>
> > > > > > > > > > > <parent>pci_0000_86_00_0</parent>
> > > > > > > > > > > <capability type='mdev'>
> > > > > > > > > > > <type='11'/>
> > > > > > > > > > > ...
> > > > > > > > > > > </capability>
> > > > > > > > > > > </device>
> > > > > > Then let's get rid of the 'name' attribute and let the sysfs
> > > > > > directory
> > > > > > simply be the name. Then we can get rid of 'type' altogether so we
> > > > > > don't have this '11' vs 'GRID-M60-0B' issue. Thanks,
> > > > > That sounds nice to me - we don't need two unique identifiers if
> > > > > one will do.
> > > > Hi Alex and Daniel,
> > > >
> > > > I just had some internal discussions here within NVIDIA and found out
> > > > that
> > > > actually the name/label potentially might not be unique and the "id"
> > > > will be.
>
> My comment below follows the above statement^^^^.
>
> > > > So I think we still would like to keep both so the id is the
> > > > programmatic id
> > > > and the name/label is a human readable string for it, which might get
> > > > changed to
> > > > be non-unique by outside of engineering.
> > > >
> > > > Sorry for the change.
> > > >
> > > > Thanks,
> > > > Neo
> > > >
> > > A curious question. How do we expect such a descriptive name/label used
> > > by upper-level stack (e.g. openstack)? Should openstack define a vGPU
> > > flavor just using ID (GRID-type11) or using both ID/name (GRID-type11-
> > > M60-0B) for end customer to choose? If it's only for human information,
> > > does it make sense e.g. providing only unique ID in sysfs while relying on
> > > vendor specific documentation to describe what the ID actually means?
> > Hi Kevin,
> >
> > The id is not visible to the upper-level stack, only the name / label will
> > be
> > shown to the end customer to choose, such as "GRID-M60-0B", as we might
> > expose
> > the same virtual device (name/label) with some internal difference which
> > will
> > be tracked by the different unique id.
>
> If the upper layer will only see the descriptive name/label, then that label
> must be unique. It's not acceptable for a particular key used by management
> software to sometimes lead to one flavor of device and sometimes another (no
> matter how small the differences may be). So if only the ID is unique, then
> the ID is what must be used in any configuration at any level.
Probably I should be clear about the "upper layer" as I was replying to Kevin
regarding what the end user will see from their interface.
The "key" will be unique throughout all management stacks and you should use the
"key / ID" for all configurations. But when you want to show a human readable
description of a virtual device, it will come from the "description" field.
Thanks,
Neo
>
> >
> > I think having the ability to allow libvirt or upper-level stack to display
> > a
> > human readable string for a given type of vgpu will make the user life
> > easier.
> >
> > Thanks,
> > Neo
> >
> > > Thanks,
> > > Kevin
> >
> >
> >
> >
> > --
> > libvir-list mailing list
> > address@hidden
> > https://www.redhat.com/mailman/listinfo/libvir-list
> >
>
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, (continued)
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/20
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Kirti Wankhede, 2016/09/21
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/21
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Kirti Wankhede, 2016/09/22
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/22
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Daniel P. Berrange, 2016/09/22
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Neo Jia, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Tian, Kevin, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Neo Jia, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Laine Stump, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration,
Neo Jia <=
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Neo Jia, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/28
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Daniel P. Berrange, 2016/09/29
- Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration, Neo Jia, 2016/09/29
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Kirti Wankhede, 2016/09/29
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Tian, Kevin, 2016/09/21
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Alex Williamson, 2016/09/21
- Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration, Tian, Kevin, 2016/09/21