qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] x86: Allow to set NUMA distance for different NUM


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [RFC] x86: Allow to set NUMA distance for different NUMA nodes
Date: Fri, 3 Mar 2017 14:10:50 -0300
User-agent: Mutt/1.7.1 (2016-10-04)

On Fri, Mar 03, 2017 at 04:52:18PM +0000, Daniel P. Berrange wrote:
> On Fri, Mar 03, 2017 at 01:47:51PM -0300, Eduardo Habkost wrote:
> > On Fri, Mar 03, 2017 at 04:26:12PM +0000, Daniel P. Berrange wrote:
> > > On Fri, Mar 03, 2017 at 10:09:22AM -0600, Eric Blake wrote:
> > > > On 03/03/2017 07:57 AM, Eduardo Habkost wrote:
> > > > 
> > > > >> With this patch, when a user wants to create a guest that contains
> > > > >> several vNUMA nodes and also wants to set distance among those nodes,
> > > > >> the QEMU command would like:
> > > > >>
> > > > >> ```
> > > > >> -object 
> > > > >> memory-backend-ram,size=1G,prealloc=yes,host-nodes=0,policy=bind,id=node0
> > > > >>  \
> > > > >> -numa 
> > > > >> node,nodeid=0,cpus=0,memdev=node0,distance=10,distance=21,distance=31,distance=41
> > > > >>  \
> > > > 
> > > > > 
> > > > > It would be nice to have a more intuitive syntax to represent
> > > > > ordered lists in QemuOpts. But this is what we have today.
> > > > > 
> > > > 
> > > > Markus has the discussion on representing arrays via the command line;
> > > > particularly since this array is very tightly coupled to the order in
> > > > which values are presented, it may be worth having:
> > > > 
> > > > -numa
> > > > node,nodeid=0,cpus=0,memdev=nod0,distance.0=10,distance.1=21,distance.2=31,distance.3=41
> > > > 
> > > > with the explicit distance.0= suffixes to distance making it more
> > > > obvious that we are dealing with an array.
> > > > 
> > > > > I think the proposal makes sense. I would like the semantics of the 
> > > > > new option
> > > > > to be documented at qapi-schema.json and qemu-options.hx.
> > > > > 
> > > > > I would call the new NumaNodeOptions field "distances", as it is
> > > > > a list of distances.
> > > > 
> > > > Indeed, Markus is trying (with his work on -blockdev for 2.9) to get the
> > > > command line to a point where it is identical to the QMP code, by
> > > > reusing qapi-schema.json, so we should very much keep that in mind with
> > > > whatever we add to -numa in 2.10.
> > > > 
> > > > 
> > > > > but in the future we could support something like:
> > > > > 
> > > > >   -numa node,nodeid=0,cpus=0,memdev=node0 \
> > > > >   -numa node,nodeid=1,cpus=1,memdev=node1 \
> > > > >   -numa node,nodeid=2,cpus=2,memdev=node2 \
> > > > >   -numa node,nodeid=3,cpus=3,memdev=node3 \
> > > > >   -numa 
> > > > > distances,distances[0][0]=10,distances[0][1]=21,distances[0][2]=31,distances[0][3]=41,\
> > > > >                   
> > > > > distances[1][0]=21,distances[1][1]=10,distances[1][2]=21,distances[1][3]=31,\
> > > > >                   
> > > > > distances[2][0]=31,distances[2][1]=21,distances[2][2]=10,distances[2][3]=21,\
> > > > >                   
> > > > > distances[3][0]=41,distances[3][1]=31,distances[3][2]=21,distances[3][3]=10
> > > > 
> > > > Except that [] requires special shell quoting, so the proposal would be
> > > > more like:
> > > > 
> > > > -numa distances.0.0=10,distances.0.1=21
> > > > 
> > > > Right now, QMP doesn't support 2-D arrays (although this may be a good
> > > > reason to introduce support), so that's also something to think about
> > > > (not insurmountable, but makes the task more complex).
> > > 
> > > What I don't like about this syntax is that it is duplicating information
> > > twice. IIUC the NUMA distance information is unidirectional, so specifying
> > > the same data for both direetions (node 0 -> node 3, and node 3 -> node 0)
> > > looks like overkill. Also the self-node distance isi defined to always be
> > > 10 IIUC, so specifying that is not required. IOW, could cut down the data
> > > we need to provider to just
> > > 
> > >    -numa distances,nodea=0,nodeb=1,value=20
> > >    -numa distances,nodea=0,nodeb=2,value=20
> > >    -numa distances,nodea=0,nodeb=3,value=20
> > >    -numa distances,nodea=1,nodeb=2,value=20
> > >    -numa distances,nodea=1,nodeb=3,value=20
> > >    -numa distances,nodea=2,nodeb=3,value=20
> > 
> > The ACPI spec (I'm looking at revision 5.0) explicitly mentions
> > that A->B distance may be different from B->A distrance:
> > 
> > "The entry value is a one-byte unsigned integer. The relative
> > distance from System Locality i to System Locality j is the
> > i*N + j entry in the matrix, where N is the number of System
> > Localities.  Except for the relative distance from a System
> > Locality to itself, each relative distance is stored twice in the
> > matrix. This provides the capability to describe the scenario
> > where the relative distances for the two directions between
> > System Localities is different."
> 
> Ah interesting, learn something new every day ? I've only made
> that unidirectional assumption for the last 10 years ;-P
> 
> > But I agree we could figure out a more compact syntax for more
> > common cases where self-node distance is 10 and distance is the
> > same both ways.
> 
> QAPI would need a specialized numeric matrix type, which we could
> efficiently map into some CLI syntax, in order to avoid needing to
> tickle the rather verbose general purpose list syntax. Probably
> not worth the hassle though - rather than just picking shorter
> variable names eg
> 
>   -numa dist,a=0,b=1,val=3
> 
> instead of
> 
>   -numa distances,nodea=0,nodeb=1,value=20

Whatever syntax/names we choose, we could have reasonable
defaults for omitted values:

* If A->B is set and B->A is omitted, use the same value for both
  A->B and B->A
* If A->A is omitted, use min(10, configured_distances)

This way, the previous example:

   -numa 
distances,distances.0.0=10,distances.0.1=21,distances.0.2=31,distances.0.3=41,\
                   
distances.1.0=21,distances.1.1=10,distances.1.2=21,distances.1.3=31,\
                   
distances.2.0=31,distances.2.1=21,distances.2.2=10,distances.2.3=21,\
                   
distances.3.0=41,distances.3.1=31,distances.3.2=21,distances.3.3=10

could be written as:

   -numa distances,distances.0.1=21,distances.0.2=31,distances.0.3=41,\
                                    distances.1.2=21,distances.1.3=31,\
                                                     distances.2.3=21

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]