qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes
Date: Wed, 5 Jun 2013 12:54:09 -0300
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Jun 05, 2013 at 07:57:42AM -0500, Anthony Liguori wrote:
> Wanlong Gao <address@hidden> writes:
> 
> > Add monitor command mem-nodes to show the huge mapped
> > memory nodes locations.
> >
> > (qemu) info mem-nodes
> > /proc/14132/fd/13: 00002aaaaac00000-00002aaaeac00000: node0
> > /proc/14132/fd/13: 00002aaaeac00000-00002aab2ac00000: node1
> > /proc/14132/fd/14: 00002aab2ac00000-00002aab2b000000: node0
> > /proc/14132/fd/14: 00002aab2b000000-00002aab2b400000: node1
> 
> This creates an ABI that we don't currently support.  Memory hotplug or
> a variety of things can break this mapping and then we'd have to provide
> an interface to describe that the mapping was broken.

What do you mean by "breaking this mapping", exactly? Would the backing
file of existing guest RAM ever change? (It would require a memory copy
from one file to another, why would QEMU ever do that?)

> 
> Also, it only works with hugetlbfs which is probbably not widely used
> given the existance of THP.

Quoting yourself at
http://article.gmane.org/gmane.comp.emulators.kvm.devel/58227:

>> It's extremely likely that if you're doing NUMA pinning, you're also 
>> doing large pages via hugetlbfs.  numactl can already set policies for 
>> files in hugetlbfs so all you need to do is have a separate hugetlbfs 
>> file for each numa node.
>> 
>> Then you have all the flexibility of numactl and you can implement node 
>> migration external to QEMU if you so desire.

And if we simply report where are the backing files and offsets being
used for guest RAM, one could simply use
'numactl --file --offset --length', so we don't even need separate
files/mem-paths for each node.

Does THP work with tmpfs, already? If it does, people who doesn't want
hugetlbfs and want numa tuning to work with THP could just use tmpfs for
-mem-path.

> 
> I had hoped that we would get proper userspace interfaces for describing
> memory groups but that appears to have stalled out.

I would love to have it. But while we don't have it, sharing the
tmpfs/hugetlbfs backing files seem to work just fine as a mechanism to
let other tools manipulate guest memory policy. We just need to let
external tools know where the backing files are.

> 
> Does anyone know if this is still on the table?
> 
> If we can't get a proper kernel interface, then perhaps we need to add
> full libnuma support but that would really be unfortunate...

Why isn't the "info mem-nodes" solution (I mean: not this version, but a
proper QMP version that exposes all the information we need) an option?


> 
> Regards,
> 
> Anthony Liguori
> 
> >
> > Refer to the proposal of Eduardo and Daniel.
> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/93476
> >
> > Signed-off-by: Wanlong Gao <address@hidden>
> > ---
> >  monitor.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 45 insertions(+)
> >
> > diff --git a/monitor.c b/monitor.c
> > index eefc7f0..85c865f 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -74,6 +74,10 @@
> >  #endif
> >  #include "hw/lm32/lm32_pic.h"
> >  
> > +#if defined(CONFIG_NUMA)
> > +#include <numaif.h>
> > +#endif
> > +
> >  //#define DEBUG
> >  //#define DEBUG_COMPLETION
> >  
> > @@ -1759,6 +1763,38 @@ static void mem_info(Monitor *mon, const QDict 
> > *qdict)
> >  }
> >  #endif
> >  
> > +#if defined(CONFIG_NUMA)
> > +static void mem_nodes(Monitor *mon, const QDict *qdict)
> > +{
> > +    RAMBlock *block;
> > +    int prevnode, node;
> > +    unsigned long long c, start, area;
> > +    int fd;
> > +    int pid = getpid();
> > +    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> > +        if (!(fd = block->fd))
> > +            continue;
> > +        prevnode = -1;
> > +        start = 0;
> > +        area = (unsigned long long)block->host;
> > +        for (c = 0; c < block->length; c += TARGET_PAGE_SIZE) {
> > +            if (get_mempolicy(&node, NULL, 0, c + block->host,
> > +                              MPOL_F_ADDR | MPOL_F_NODE) < 0)
> > +                continue;
> > +            if (node == prevnode)
> > +                continue;
> > +            if (prevnode != -1)
> > +                monitor_printf(mon, "/proc/%d/fd/%d: %016Lx-%016Lx: 
> > node%d\n",
> > +                               pid, fd, start + area, c + area, prevnode);
> > +            prevnode = node;
> > +            start = c;
> > +         }
> > +         monitor_printf(mon, "/proc/%d/fd/%d: %016Lx-%016Lx: node%d\n",
> > +                        pid, fd, start + area, c + area, prevnode);
> > +    }
> > +}
> > +#endif
> > +
> >  #if defined(TARGET_SH4)
> >  
> >  static void print_tlb(Monitor *mon, int idx, tlb_t *tlb)
> > @@ -2567,6 +2603,15 @@ static mon_cmd_t info_cmds[] = {
> >          .mhandler.cmd = mem_info,
> >      },
> >  #endif
> > +#if defined(CONFIG_NUMA)
> > +    {
> > +        .name       = "mem-nodes",
> > +        .args_type  = "",
> > +        .params     = "",
> > +        .help       = "show the huge mapped memory nodes location",
> > +        .mhandler.cmd = mem_nodes,
> > +    },
> > +#endif
> >      {
> >          .name       = "mtree",
> >          .args_type  = "",
> > -- 
> > 1.8.3.rc2.10.g0c2b1cf
> 

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]