[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: device compatibility interface for live migration with assigned devi
From: |
Yan Zhao |
Subject: |
Re: device compatibility interface for live migration with assigned devices |
Date: |
Mon, 17 Aug 2020 09:52:43 +0800 |
User-agent: |
Mutt/1.9.4 (2018-02-28) |
On Fri, Aug 14, 2020 at 01:30:00PM +0100, Sean Mooney wrote:
> On Fri, 2020-08-14 at 13:16 +0800, Yan Zhao wrote:
> > On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote:
> > >
> > > On 2020/8/10 下午3:46, Yan Zhao wrote:
> > > > > driver is it handled by?
> > > >
> > > > It looks that the devlink is for network device specific, and in
> > > > devlink.h, it says
> > > > include/uapi/linux/devlink.h - Network physical device Netlink
> > > > interface,
> > >
> > >
> > > Actually not, I think there used to have some discussion last year and the
> > > conclusion is to remove this comment.
> > >
> > > It supports IB and probably vDPA in the future.
> > >
> >
> > hmm... sorry, I didn't find the referred discussion. only below discussion
> > regarding to why to add devlink.
> >
> > https://www.mail-archive.com/netdev@vger.kernel.org/msg95801.html
> > >This doesn't seem to be too much related to networking? Why can't
> > something
> > >like this be in sysfs?
> >
> > It is related to networking quite bit. There has been couple of
> > iteration of this, including sysfs and configfs implementations. There
> > has been a consensus reached that this should be done by netlink. I
> > believe netlink is really the best for this purpose. Sysfs is not a good
> > idea
> >
> > https://www.mail-archive.com/netdev@vger.kernel.org/msg96102.html
> > >there is already a way to change eth/ib via
> > >echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1
> > >
> > >sounds like this is another way to achieve the same?
> >
> > It is. However the current way is driver-specific, not correct.
> > For mlx5, we need the same, it cannot be done in this way. Do devlink is
> > the correct way to go.
> im not sure i agree with that.
> standardising a filesystem based api that is used across all vendors is also
> a valid
> option. that said if devlink is the right choice form a kerenl perspective
> by all
> means use it but i have not heard a convincing argument for why it actually
> better.
> with tthat said we have been uing tools like ethtool to manage aspect of nics
> for decades
> so its not that strange an idea to use a tool and binary protocoal rather
> then a text
> based interface for this but there are advantages to both approches.
> >
Yes, I agree with you.
> > https://lwn.net/Articles/674867/
> > There a is need for some userspace API that would allow to expose things
> > that are not directly related to any device class like net_device of
> > ib_device, but rather chip-wide/switch-ASIC-wide stuff.
> >
> > Use cases:
> > 1) get/set of port type (Ethernet/InfiniBand)
> > 2) monitoring of hardware messages to and from chip
> > 3) setting up port splitters - split port into multiple ones and squash
> > again,
> > enables usage of splitter cable
> > 4) setting up shared buffers - shared among multiple ports within one
> > chip
> >
> >
> >
> > we actually can also retrieve the same information through sysfs, .e.g
> >
> > > - [path to device]
> >
> > |--- migration
> > | |--- self
> > | | |---device_api
> > | | |---mdev_type
> > | | |---software_version
> > | | |---device_id
> > | | |---aggregator
> > | |--- compatible
> > | | |---device_api
> > | | |---mdev_type
> > | | |---software_version
> > | | |---device_id
> > | | |---aggregator
> >
> >
> >
> > >
> > > > I feel like it's not very appropriate for a GPU driver to use
> > > > this interface. Is that right?
> > >
> > >
> > > I think not though most of the users are switch or ethernet devices. It
> > > doesn't prevent you from inventing new abstractions.
> >
> > so need to patch devlink core and the userspace devlink tool?
> > e.g. devlink migration
> and devlink python libs if openstack was to use it directly.
> we do have caes where we just frok a process and execaute a comannd in a shell
> with or without elevated privladge but we really dont like doing that due to
> the performacne impacat and security implciations so where we can use python
> bindign
> over c apis we do. pyroute2 is the only python lib i know off of the top of
> my head
> that support devlink so we would need to enhacne it to support this new
> devlink api.
> there may be otherss i have not really looked in the past since we dont need
> to use
> devlink at all today.
> >
> > > Note that devlink is based on netlink, netlink has been widely used by
> > > various subsystems other than networking.
> >
> > the advantage of netlink I see is that it can monitor device status and
> > notify upper layer that migration database needs to get updated.
> > But not sure whether openstack would like to use this capability.
> > As Sean said, it's heavy for openstack. it's heavy for vendor driver
> > as well :)
> >
> > And devlink monitor now listens the notification and dumps the state
> > changes. If we want to use it, need to let it forward the notification
> > and dumped info to openstack, right?
> i dont think we would use direct devlink monitoring in nova even if it was
> avaiable.
> we could but we already poll libvirt and the system for other resouce
> periodicly.
so, if we use file system based approach, could openstack periodically check and
update the migration info?
e.g.
every minute, read /sys/<path to device>/migration/self/*, and if there
are any file disappearing or appearing or content changes, just let the
placement know.
Then when about to start migration, check source device's
/sys/<path to src device>/migration/compatible/* and searches the
placement if there are existing device matching to it,
if yes, create vm with the device and migrate to it;
if not, and if it's an mdev, try to create a matching one and migrate to
it.
(to create a matching mdev, I guess openstack can follow below sequence:
1. find a target device with the same device id (e.g. parent pci id)
2. create an mdev with matching mdev type
3. adjust other vendor specific attributes
4. if 2 or 3 fails, go to 1 again
)
is this approach feasible?
> we likely wouldl just add monitoriv via devlink to that periodic task.
> we certenly would not use it to detect a migration or a need to update a
> migration database(not sure what that is)
by migration database, I meant the traits in the placement. :)
if a periodic monitoring or devlink is required, then periodically
monitor sysfs is also viable, right?
>
> in reality if we can consume this info indirectly via a libvirt api that will
> be the appcoh we will take at least for the libvirt driver in nova. for cyborg
> they may take a different appoch. we already use pyroute2 in 2 projects,
> os-vif and
> neutron and it does have devlink support so the burden of using devlink is
> not that
> high for openstack but its a less frineadly interface for configuration tools
> like
> ansiable vs a filesystem based approch.
> >
- Re: device compatibility interface for live migration with assigned devices, (continued)
- Re: device compatibility interface for live migration with assigned devices, Jiri Pirko, 2020/08/05
- Re: device compatibility interface for live migration with assigned devices, Sean Mooney, 2020/08/05
- Re: device compatibility interface for live migration with assigned devices, Cornelia Huck, 2020/08/07
- Re: device compatibility interface for live migration with assigned devices, Cornelia Huck, 2020/08/13
- Re: device compatibility interface for live migration with assigned devices, Eric Farman, 2020/08/13
- Re: device compatibility interface for live migration with assigned devices, Cornelia Huck, 2020/08/17
- Re: device compatibility interface for live migration with assigned devices, Yan Zhao, 2020/08/10
- Re: device compatibility interface for live migration with assigned devices, Jason Wang, 2020/08/13
- Re: device compatibility interface for live migration with assigned devices, Yan Zhao, 2020/08/14
- Re: device compatibility interface for live migration with assigned devices, Sean Mooney, 2020/08/14
- Re: device compatibility interface for live migration with assigned devices,
Yan Zhao <=
- Re: device compatibility interface for live migration with assigned devices, Jason Wang, 2020/08/17
- Re: device compatibility interface for live migration with assigned devices, Daniel P . Berrangé, 2020/08/18
- Re: device compatibility interface for live migration with assigned devices, Jason Wang, 2020/08/18
- Re: device compatibility interface for live migration with assigned devices, Daniel P . Berrangé, 2020/08/18
- Re: device compatibility interface for live migration with assigned devices, Cornelia Huck, 2020/08/18
- RE: device compatibility interface for live migration with assigned devices, Parav Pandit, 2020/08/18
- Re: device compatibility interface for live migration with assigned devices, Yan Zhao, 2020/08/18
- RE: device compatibility interface for live migration with assigned devices, Parav Pandit, 2020/08/19
- Re: device compatibility interface for live migration with assigned devices, Jason Wang, 2020/08/19
- Re: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices, Jason Wang, 2020/08/19