[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] qemu-ppc and NUMA topology
From: |
Alexey Kardashevskiy |
Subject: |
Re: [Qemu-ppc] qemu-ppc and NUMA topology |
Date: |
Thu, 29 May 2014 11:57:01 +1000 |
User-agent: |
Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 |
On 05/29/2014 05:57 AM, Nishanth Aravamudan wrote:
> On 27.05.2014 [14:59:03 -0700], Nishanth Aravamudan wrote:
>> On 20.05.2014 [12:44:15 +1000], Alexey Kardashevskiy wrote:
>>> On 05/20/2014 10:06 AM, Nishanth Aravamudan wrote:
>>>> On 19.05.2014 [15:37:52 -0700], Nishanth Aravamudan wrote:
>>>>> Hi Alexey,
>>>>>
>>>>> I've been looking at hw/ppc/spapr.c::spapr_populate_memory() and ran
>>>>> into a few questions:
>>>>>
>>>>> 1) The values from 1 to nb_numa_nodes are used as indices into the
>>>>> node_mem array, but that is not populated, necessarily, linearly.
>>>>> vl.c::add_node() uses the nodeid parameter as the index into node_mem,
>>>>> if it is specified.
>>>>>
>>>>> 2) The node ID is based upon the index into the array, but it seems like
>>>>> it should actually be based upon the nodeid specified, if any. That is,
>>>>> we set the value at index 4 (which is statically the reference point in
>>>>> 'ibm,associativity-reference-points') of 'ibm,associativty' for each
>>>>> 'ibm,address@hidden' node to the index we are currently at. But as
>>>>> mentioned in 1) above that index isn't necessarily currently the nodeid
>>>>> specified on the command-line.
>>>>>
>>>>> What this all means, is that if I specify something like:
>>>>>
>>>>> -numa node,nodeid=1,cpus=0-7,mem=2048 -numa
>>>>> node,nodeid=5,cpus=8-15,mem=0 -numa node,nodeid=9,mem=2048
>>>>>
>>>>> Linux sees:
>>>>>
>>>>> numactl --hardware
>>>>> available: 3 nodes (0-2)
>>>>> node 0 cpus: 8 9 10 11 12 13 14 15
>>>>> node 0 size: 0 MB
>>>>> node 0 free: 0 MB
>>>>> node 1 cpus: 0 1 2 3 4 5 6 7
>>>>> node 1 size: 2024 MB
>>>>> node 1 free: 1560 MB
>>>>> node 2 cpus:
>>>>> node 2 size: 0 MB
>>>>> node 2 free: 0 MB
>>>>>
>>>>> Maybe we don't really care about this, but I just noticed it when trying
>>>>> to reproduce some really weird topologies from PowerVM.
>>>>
>>>> Upon further investigation into node_mem, it seems like this assumption
>>>> is present throughout the qemu code, e.g, the qemu monitor 'info numa'
>>>> command. Will just document it for myself as a weird way to make
>>>> memoryless nodes show up :)
>>>
>>> I never looked closely at this NUMA business so I know as much as you do :)
>>> You seem to be right, vl.c seems to get things right (it uses nodeid as an
>>> index) but spapr.c is broken and we probably should fix it but it does not
>>> sound very urgent to me...
>>
>> Well, and looking at it more, it feels like perhaps that none of the
>> qemu code is particularly careful about this -- and since you can
>> explicitly assign 0 memory to a node, you can't simply check for 0 in
>> node_mem for an unassigned node (and node_mem is an unsigned array).
>>
>> I'll look at the behavior on x86 and get back to you.
>
> Well, it looks like ppc is no worse off than x86 here -- passing a
> similar command-line to qemu-system-x86_64, I get the same result in the
> VM (nodes numbered starting at 0, etc).
>
> Perhaps it makes sense to not allow non-sequential NUMA node ordering,
> since it isn't really supported anyways? I'm not entirely sure I see why
> it'd be necessary for a guest in any case.
How urgent is that thing? I do not have much time for experiments now (but
I am still planning this) but I do not really think we need to put new
limit here (even if x86 does the same thing). If phyp can do non sequential
nodes, then guests most probably support it and all we have to do is cook
correct device tree...
--
Alexey