qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes
Date: Tue, 11 Jun 2013 10:40:17 -0300
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Jun 11, 2013 at 03:22:13PM +0800, Wanlong Gao wrote:
> On 06/05/2013 09:46 PM, Eduardo Habkost wrote:
> > On Wed, Jun 05, 2013 at 11:58:25AM +0800, Wanlong Gao wrote:
> >> Add monitor command mem-nodes to show the huge mapped
> >> memory nodes locations.
> >>
> > 
> > This is for machine consumption, so we need a QMP command.
> > 
> >> (qemu) info mem-nodes
> >> /proc/14132/fd/13: 00002aaaaac00000-00002aaaeac00000: node0
> >> /proc/14132/fd/13: 00002aaaeac00000-00002aab2ac00000: node1
> >> /proc/14132/fd/14: 00002aab2ac00000-00002aab2b000000: node0
> >> /proc/14132/fd/14: 00002aab2b000000-00002aab2b400000: node1
> > 
> > Are node0/node1 _host_ nodes?
> > 
> > How do I know what's the _guest_ address/node corresponding to each
> > file/range above?
> > 
> > What I am really looking for is:
> > 
> >  * The correspondence between guest (virtual) NUMA nodes and guest
> >    physical address ranges (it could be provided by the QMP version of
> >    "info numa")
> 
> AFAIK, the guest NUMA nodes and guest physical address ranges are set
> by seabios, we can't get this information from QEMU,

QEMU _has_ to know about it, otherwise we would never be able to know
which virtual addresses inside the QEMU process (or offsets inside the
backing files) belong to which virtual NUMA node.

(After all, the NUMA wiring is a hardware feature, not something that
the BIOS can decide)


> and I think this
> information is useless for pinning memory range to host.

Well, we have to somehow identify each region of guest memory when
deciding how to pin it. How would you identify it without using guest
physical addresses? Guest physical addresses are more meaningful than
the QEMU virtual addresses your patch exposes (that are meaningless
outside QEMU).



> >  * The correspondence between guest physical address ranges and ranges
> >    inside the mapped files (so external tools could set the policy on
> >    those files instead of requiring QEMU to set it directly)
> > 
> > I understand that your use case may require additional information and
> > additional interfaces. But if we provide the information above we will
> > allow external components set the policy on the hugetlbfs files before
> > we add new interfaces required for your use case.
> 
> But the file backed memory is not good for the host which has many
> virtual machines, in this situation, we can't handle anon THP yet.

I don't understand what you mean, here. What prevents someone from using
file-backed memory with multiple virtual machines?

> 
> And as I mentioned, the cross numa node access performance regression
> is caused by pci-passthrough, it's a very long time bug, we should
> back port the host memory pinning patch to old QEMU to resolve this 
> performance
> problem, too.

If it's a regression, what's the last version of QEMU where the bug
wasn't present?


> 
> Thanks,
> Wanlong Gao
> 
> > 
> > Also, what about making it conditional to OSes where we really know
> > "/proc/<pid>/fd/<fd>" is available?
> > 
> > 
> >>
> >> Refer to the proposal of Eduardo and Daniel.
> >> http://article.gmane.org/gmane.comp.emulators.kvm.devel/93476
> > 
> >>
> >> Signed-off-by: Wanlong Gao <address@hidden>
> >> ---
> >>  monitor.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 45 insertions(+)
> >>
> >> diff --git a/monitor.c b/monitor.c
> >> index eefc7f0..85c865f 100644
> >> --- a/monitor.c
> >> +++ b/monitor.c
> >> @@ -74,6 +74,10 @@
> >>  #endif
> >>  #include "hw/lm32/lm32_pic.h"
> >>  
> >> +#if defined(CONFIG_NUMA)
> >> +#include <numaif.h>
> >> +#endif
> >> +
> >>  //#define DEBUG
> >>  //#define DEBUG_COMPLETION
> >>  
> >> @@ -1759,6 +1763,38 @@ static void mem_info(Monitor *mon, const QDict 
> >> *qdict)
> >>  }
> >>  #endif
> >>  
> >> +#if defined(CONFIG_NUMA)
> >> +static void mem_nodes(Monitor *mon, const QDict *qdict)
> >> +{
> >> +    RAMBlock *block;
> >> +    int prevnode, node;
> >> +    unsigned long long c, start, area;
> >> +    int fd;
> >> +    int pid = getpid();
> >> +    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> >> +        if (!(fd = block->fd))
> >> +            continue;
> >> +        prevnode = -1;
> >> +        start = 0;
> >> +        area = (unsigned long long)block->host;
> >> +        for (c = 0; c < block->length; c += TARGET_PAGE_SIZE) {
> >> +            if (get_mempolicy(&node, NULL, 0, c + block->host,
> >> +                              MPOL_F_ADDR | MPOL_F_NODE) < 0)
> >> +                continue;
> >> +            if (node == prevnode)
> >> +                continue;
> >> +            if (prevnode != -1)
> >> +                monitor_printf(mon, "/proc/%d/fd/%d: %016Lx-%016Lx: 
> >> node%d\n",
> >> +                               pid, fd, start + area, c + area, prevnode);
> >> +            prevnode = node;
> >> +            start = c;
> >> +         }
> >> +         monitor_printf(mon, "/proc/%d/fd/%d: %016Lx-%016Lx: node%d\n",
> >> +                        pid, fd, start + area, c + area, prevnode);
> >> +    }
> >> +}
> >> +#endif
> >> +
> >>  #if defined(TARGET_SH4)
> >>  
> >>  static void print_tlb(Monitor *mon, int idx, tlb_t *tlb)
> >> @@ -2567,6 +2603,15 @@ static mon_cmd_t info_cmds[] = {
> >>          .mhandler.cmd = mem_info,
> >>      },
> >>  #endif
> >> +#if defined(CONFIG_NUMA)
> >> +    {
> >> +        .name       = "mem-nodes",
> >> +        .args_type  = "",
> >> +        .params     = "",
> >> +        .help       = "show the huge mapped memory nodes location",
> >> +        .mhandler.cmd = mem_nodes,
> >> +    },
> >> +#endif
> >>      {
> >>          .name       = "mtree",
> >>          .args_type  = "",
> >> -- 
> >> 1.8.3.rc2.10.g0c2b1cf
> >>
> > 
> 

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]