qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes


From: Nishanth Aravamudan
Subject: Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes
Date: Mon, 16 Jun 2014 11:49:46 -0700
User-agent: Mutt/1.5.21 (2010-09-15)

On 16.06.2014 [13:15:00 -0300], Eduardo Habkost wrote:
> On Mon, Jun 16, 2014 at 05:53:53PM +1000, Alexey Kardashevskiy wrote:
> > Currently NUMA nodes must go consequently and QEMU ignores nodes
> > with @nodeid bigger than the number of NUMA nodes in the command line.
> 
> Why would somebody need a NUMA node with nodeid bigger than the number
> of NUMA nodes? NUMA node IDs must be in the [0, numa_nodes-1] range.

That is not how the code works currently.

vl.c::numa_add()
...
        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
            nodenr = nb_numa_nodes;
        } else {
            if (parse_uint_full(option, &nodenr, 10) < 0) {
                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
                exit(1);
            }
        }
...
        if (get_param_value(option, 128, "mem", optarg) == 0) {
            node_mem[nodenr] = 0;
        } else {
            int64_t sval;
            sval = strtosz(option, &endptr);
            if (sval < 0 || *endptr) {
                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
                exit(1);
            }
            node_mem[nodenr] = sval;
        }
...
        nb_numa_nodes++;
...

So if a user passes nodeid= to the NUMA node definition, that entry in
node_mem is set to the appropriate value, but nb_numa_nodes, which is
used to bound the iteration of that array is not bumped appropriately.
So we end up looking at arbitrary indices in the node_mem array, which
are often 0.

Note also that means that we can't generically differentiate here
between a user-defined memoryless node and one that happens to be 0
because the particular nodeid was not specified on the command-line.
Alexey, how do you differentiate between these two cases after your
patches? In patch 3, I see you check (and skip in the loop) explicitly
if !node_mem[nodeid], but I'm not sure how that check can differentiate
between the statically 0 (from main's intialization loop) and when a
user says a node's memory is 0. Probably something obvious I'm missing
(it is Monday after all)...

> > This prevents us from creating memory-less nodes which is possible
> > situation in POWERPC under pHyp or Sapphire.
> 
> Why? If I recall correctly, nodes without any CPUs or any memory are
> already allowed.

They are allowed, but it seems like the code throughout qemu (where it's
relevant) assumes that NUMA nodes are sequential and continuous, but
that's not required (nor is it enforced on the command-line).

> How exactly would this patch help you? How do you expect the
> command-line to look like for your use case?

Alexey has replied with that, it looks like.

> > This makes nb_numa_nodes a total number of nodes or the biggest node
> > number + 1 whichever is greater.
> > 
> > Signed-off-by: Alexey Kardashevskiy <address@hidden>
> > ---
> >  vl.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/vl.c b/vl.c
> > index ac0e3d7..f1b75cb 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -1356,7 +1356,7 @@ static void numa_add(const char *optarg)
> >          if (get_param_value(option, 128, "cpus", optarg) != 0) {
> >              numa_node_parse_cpus(nodenr, option);
> >          }
> > -        nb_numa_nodes++;
> > +        nb_numa_nodes = MAX(nb_numa_nodes + 1, nodenr + 1);
> 
> I would instead suggest that if any node in the [0, max_node_id] range
> is not present on the command-line, QEMU should instead reject the
> command-line.

We already check two things:

Too many nodes (meaning we've filled the array):
        if (nb_numa_nodes >= MAX_NODES) {
            fprintf(stderr, "qemu: too many NUMA nodes\n");
            exit(1);
        }

Node ID itself is out of range (due to the use of an array):
        if (nodenr >= MAX_NODES) {
            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
            exit(1);
        }

But that doesn't prevent one from having *sparse* NUMA node IDs. And, as
far as I can tell, this is allowed by the spec, but isn't properly
supported by qemu.

Thanks,
Nish




reply via email to

[Prev in Thread] Current Thread [Next in Thread]