qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA bi


From: Vaidyanathan Srinivasan
Subject: Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding
Date: Sun, 30 Oct 2011 15:02:49 +0530
User-agent: Mutt/1.5.21 (2010-09-15)

* Alexander Graf <address@hidden> [2011-10-29 21:57:38]:

> 
> On 29.10.2011, at 20:45, Bharata B Rao wrote:
> 
> > Hi,
> > 
> > As guests become NUMA aware, it becomes important for the guests to
> > have correct NUMA policies when they run on NUMA aware hosts.
> > Currently limited support for NUMA binding is available via libvirt
> > where it is possible to apply a NUMA policy to the guest as a whole.
> > However multinode guests would benefit if guest memory belonging to
> > different guest nodes are mapped appropriately to different host NUMA nodes.
> > 
> > To achieve this we would need QEMU to expose information about
> > guest RAM ranges (Guest Physical Address - GPA) and their host virtual
> > address mappings (Host Virtual Address - HVA). Using GPA and HVA, any 
> > external
> > tool like libvirt would be able to divide the guest RAM as per the guest 
> > NUMA
> > node geometry and bind guest memory nodes to corresponding host memory nodes
> > using HVA. This needs both QEMU (and libvirt) changes as well as changes
> > in the kernel.
> 
> Ok, let's take a step back here. You are basically growing libvirt into a 
> memory resource manager that know how much memory is available on which nodes 
> and how these nodes would possibly fit into the host's memory layout.

Yes, the motivation is to get libvirt to manage memory and numa
related policies more effectively just like we do vcpu pinning and
allocations.  We would like libvirt to know about the host numa
configurations and map it to guest memory layout to minimize
cross-node references within guest.

> Shouldn't that be the kernel's job? It seems to me that architecturally the 
> kernel is the place I would want my memory resource controls to be in.

Kernel is the one implementing the policy.  Kernel cannot know guest
memory layout or expectations for that VM.  Kernel today sees guest as
a single process that obeys numactl bindings.  What this patch is
trying to do it to make the policy recommendations more fine grain and
effective for a multi-node guest.

Qemu knows the layouts and can tell the kernel or issue the mbind()
calls to setup the numa affinity, however qemu's assumptions could
change if libvirt enforces policies on it using cgroups and cpusets.
Hence in the proposed approach we would allow libvirt to be the policy
owner, get the required information from qemu and set the policy by
informing the kernel, just like we do for vcpus today.

> Imagine QEMU could tell the kernel that different parts of its virtual memory 
> address space are supposed to be fast on different host vcpu threads. Then 
> the kernel has all the information it needs. It could even potentially 
> migrate memory towards a thread, whenever the scheduler determines that it's 
> better to run a thread somewhere else.

Migrating memory near to vcpu or scheduling vcpus closer to the memory
node is a good approach as proposed by Andrea Arcangeli as autonuma.
That could be one of the policies that libvirt can choose for a given
scenario.

> That said, I don't disagree with your approach per se. It just sounds way too 
> static to me and tries to overcome shortcomings we have in the Linux mm 
> system by replacing it with hardcoded pinning logic in user space.
 
Thanks for the review, I agree that fine control on memory and cpu
pinning needs to be used carefully to get the desired positive effect.
This proposal is a good first step to handle multi-node guest
effectively compared to the default policies that are available today.

--Vaidy





reply via email to

[Prev in Thread] Current Thread [Next in Thread]