qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/7] spapr: Refactor spapr_populate_memory()


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] [PATCH 3/7] spapr: Refactor spapr_populate_memory()
Date: Tue, 24 Jun 2014 16:14:11 +1000
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 06/24/2014 01:08 PM, Nishanth Aravamudan wrote:
> On 21.06.2014 [13:06:53 +1000], Alexey Kardashevskiy wrote:
>> On 06/21/2014 08:55 AM, Nishanth Aravamudan wrote:
>>> On 16.06.2014 [17:53:49 +1000], Alexey Kardashevskiy wrote:
>>>> Current QEMU does not support memoryless NUMA nodes.
>>>> This prepares SPAPR for that.
>>>>
>>>> This moves 2 calls of spapr_populate_memory_node() into
>>>> the existing loop which handles nodes other than than
>>>> the first one.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>> ---
>>>>  hw/ppc/spapr.c | 31 +++++++++++--------------------
>>>>  1 file changed, 11 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index cb3a10a..666b676 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -689,28 +689,13 @@ static void spapr_populate_memory_node(void *fdt, 
>>>> int nodeid, hwaddr start,
>>>>
>>>>  static int spapr_populate_memory(sPAPREnvironment *spapr, void *fdt)
>>>>  {
>>>> -    hwaddr node0_size, mem_start, node_size;
>>>> +    hwaddr mem_start, node_size;
>>>>      int i;
>>>>
>>>> -    /* memory node(s) */
>>>> -    if (nb_numa_nodes > 1 && node_mem[0] < ram_size) {
>>>> -        node0_size = node_mem[0];
>>>> -    } else {
>>>> -        node0_size = ram_size;
>>>> -    }
>>>> -
>>>> -    /* RMA */
>>>> -    spapr_populate_memory_node(fdt, 0, 0, spapr->rma_size);
>>>> -
>>>> -    /* RAM: Node 0 */
>>>> -    if (node0_size > spapr->rma_size) {
>>>> -        spapr_populate_memory_node(fdt, 0, spapr->rma_size,
>>>> -                                   node0_size - spapr->rma_size);
>>>> -    }
>>>> -
>>>> -    /* RAM: Node 1 and beyond */
>>>> -    mem_start = node0_size;
>>>> -    for (i = 1; i < nb_numa_nodes; i++) {
>>>> +    for (i = 0, mem_start = 0; i < nb_numa_nodes; ++i) {
>>>> +        if (!node_mem[i]) {
>>>> +            continue;
>>>> +        }
>>>
>>> Doesn't this skip memoryless nodes? What actually puts the memoryless
>>> node in the device-tree?
>>
>> It does skip.
>>
>>> And if you were to put them in, wouldn't spapr_populate_memory_node()
>>> fail because we'd be creating two nodes with address@hidden where XXX is the
>>> same (starting address) for both?
>>
>> I cannot do this now - there is no way to tell from the command line where
>> I want NUMA node memory start from so I'll end up with multiple nodes with
>> the same name and QEMU won't start. When NUMA fixes reach upstream, I'll
>> try to work out something on top of that.
> 
> Ah I got something here. With the patches I just sent to enable sparse
> NUMA nodes, plus your series rebased on top, here's what I see in a
> Linux LPAR:
> 
> qemu-system-ppc64 -machine pseries,accel=kvm,usb=off -m 4096 -realtime 
> mlock=off -numa node,nodeid=3,mem=4096,cpus=2-3 -numa 
> node,nodeid=2,mem=0,cpus=0-1 -smp 4
> 
> info numa
> 2 nodes
> node 2 cpus: 0 1
> node 2 size: 0 MB
> node 3 cpus: 2 3
> node 3 size: 4096 MB
> 
> numactl --hardware
> available: 3 nodes (0,2-3)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus: 0 1
> node 2 size: 0 MB
> node 2 free: 0 MB
> node 3 cpus: 2 3
> node 3 size: 4073 MB
> node 3 free: 3830 MB
> node distances:
> node   0   2   3 
>   0:  10  40  40 
>   2:  40  10  40 
>   3:  40  40  10 
> 
> The trick, it seems, is if you have a memoryless node, it needs to
> have CPUs assigned to it.

Yep. The device tree does not have NUMA nodes, it only has CPUs and
address@hidden (memory banks?) and the guest kernel has to parse
ibm,associativity and reconstruct the NUMA topology. If some node is not
mentioned in any ibm,associativity, it does not exist.


> The CPU's "ibm,associativity" property will
> make Linux set up the proper NUMA topology.
> 
> Thoughts? Should there be a check that every "present" NUMA node at
> least either has CPUs or memory.

May be, I'll wait for NUMA stuff in upstream, apply your patch(es), my
patches and see what I get :)


> It seems like if neither are present,
> we can just hotplug them later?

hotplug what? NUMA nodes?

> Does qemu support topology for PCI devices?

Nope.



-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]