qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 33/35] pc: ACPI BIOS: reserve SRAT entry for hot


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH 33/35] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole
Date: Tue, 15 Apr 2014 17:55:22 +0200

On Tue, 15 Apr 2014 14:37:01 +0800
Hu Tao <address@hidden> wrote:

> On Mon, Apr 14, 2014 at 06:44:42PM +0200, Igor Mammedov wrote:
> > On Mon, 14 Apr 2014 15:25:01 +0800
> > Hu Tao <address@hidden> wrote:
> > 
> > > On Fri, Apr 04, 2014 at 03:36:58PM +0200, Igor Mammedov wrote:
> > > > Needed for Windows to use hotplugged memory device, otherwise
> > > > it complains that server is not configured for memory hotplug.
> > > > Tests shows that aftewards it uses dynamically provided
> > > > proximity value from _PXM() method if available.
> > > > 
> > > > Signed-off-by: Igor Mammedov <address@hidden>
> > > > ---
> > > >  hw/i386/acpi-build.c | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > > 
> > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > index ef89e99..012b100 100644
> > > > --- a/hw/i386/acpi-build.c
> > > > +++ b/hw/i386/acpi-build.c
> > > > @@ -1197,6 +1197,8 @@ build_srat(GArray *table_data, GArray *linker,
> > > >      uint64_t curnode;
> > > >      int srat_start, numa_start, slots;
> > > >      uint64_t mem_len, mem_base, next_base;
> > > > +    PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> > > > +    ram_addr_t hotplug_as_size = 
> > > > memory_region_size(&pcms->hotplug_memory);
> > > >  
> > > >      srat_start = table_data->len;
> > > >  
> > > > @@ -1261,6 +1263,18 @@ build_srat(GArray *table_data, GArray *linker,
> > > >          acpi_build_srat_memory(numamem, 0, 0, 0, MEM_AFFINITY_NOFLAGS);
> > > >      }
> > > >  
> > > > +    /*
> > > > +     * Fake entry required by Windows to enable memory hotplug in OS.
> > > > +     * Individual DIMM devices override proximity set here via _PXM 
> > > > method,
> > > > +     * which returns associated with it NUMA node id.
> > > > +     */
> > > > +    if (hotplug_as_size) {
> > > > +        numamem = acpi_data_push(table_data, sizeof *numamem);
> > > > +        acpi_build_srat_memory(numamem, pcms->hotplug_memory_base,
> > > > +                               hotplug_as_size, 0, 
> > > > MEM_AFFINITY_HOTPLUGGABLE |
> > > > +                               MEM_AFFINITY_ENABLED);
> > > > +    }
> > > > +
> > > 
> > > Hi Igor,
> > > 
> > > With the faked entry, memory unplug doesn't work. Entries should be set
> > > up for each node with correct flags(enable, hotpluggable) to make memory
> > > unplug work.
> > Could you be more specific, what and how doesn't work and why there is
> > need for SRAT entries per DIMM?
> > I've briefly tested with your unplug patches and linux seemed be ok with 
> > unplug,
> > i.e. device node was removed from /sys after receiving remove notification.
> 
> 
> Following are fail cases:
> 
I did some testing using upstream kernel with hot-remove enabled.
 tested only "this patch" case
 
> ------------------------------------------------------------------------+----------------------------------------------
> guest commands                                                          | 
> this patch                    | hacked SRAT
> ------------------------------------------------------------------------+----------------------------------------------
> echo 'online' > /sys/devices/system/memory/memory32/state && \          |     
>                           |
> echo 'offline' > /sys/devices/system/memory/memory32/state              |  
> fail                         | success 
works for me, but it might/allowed to fail offline since page
migration may fail if memory section or its part is not movable.

> ------------------------------------------------------------------------+----------------------------------------------
> echo 'online' > /sys/devices/system/memory/memory32/state && \          |     
>                           |
> echo 1 > /sys/devices/LNXSYSTM\:00/device\:00/PNP0C80\:00/eject         |  
> fail                         | success
the same as #1

> ------------------------------------------------------------------------+----------------------------------------------
> echo 'online_movable' > /sys/devices/system/memory/memory32/state       |  
> fail[first memory block]     | fail
it's linux implementation specific, should be fixed in guest and has
nothing to do with qemu side.
PS: all hot-added memory sections could be onlined with 'online_movable'
in reverse order.

> ------------------------------------------------------------------------+----------------------------------------------
> echo 'online_movable' > /sys/devices/system/memory/memory35/state && \  |     
>                           |
> echo 'offline' > /sys/devices/system/memory/memory35/state              |  
> success[last memory block]   | success
> ------------------------------------------------------------------------+----------------------------------------------
> echo 'online_movable' > /sys/devices/system/memory/memory32/state && \  |     
>                           |
> echo 1 > /sys/devices/LNXSYSTM\:00/device\:00/PNP0C80\:00/eject         |  
> success[last memory block]   | success
> ------------------------------------------------------------------------+----------------------------------------------
movable memory section is guarantied to succeed, hence no issue.

Reading upstream kernel code, it honors PNP0C80._PXM value and overrides
anything that was provided in SRAT. So I don't see why hacked SRAT
would make any difference.

Could you verify with the latest upstream kernel?
PS: do not forget to check "removable" attribute before marking case as failed.

One time, I've seen guest panic on "successful" eject of ZONE_NORMAL memory
section since it was still using it (so there is still hot-remove bugs in
kernel) and "removable" doesn't guarantee anything for ZONE_NORMAL memory
section.

> 
> Hacke SRAT memory entry:
> 
> PXM: 0
> range: 4G ~ 4G + 512M
> flags: Enabled Hot-Pluggable
> 
> PXM: 1
> range: 4G + 512M ~ 5G
> flags: Enabled Hot-Pluggable
> 
> So I think we should add maxmem to -numa and build SRAT accordingly.
> But there is something I'm not sure with. I added dimm in node 1, but
> it's memory range fell in node 0. Users always can cause the mismatch
> with dimm,start,node.
> 
> 
> 
> This is the relevent part in command line:
> 
> qemu command line: -m 512M,slots=4,maxmem=2G \
>                    -object memory-ram,id=foo,size=512M \
>                    -numa node,id=n0,mem=256M -numa node,id=n1,mem=256M 
> 
> (qemu monitor) device_add dimm,id=d0,memdev=foo,node=1
> 
> > 
> > > 
> > > Windows has not been tested yet. I encountered a problem that there is
> > > no SRAT in Windows so even memory hotplug doesn't work. (but there is
> > > in Linux with the same configuration).
> > For Windows to work one needs to add "-numa node" CLI option so that
> > SRAT would be exposed to guest.
> 
> Thanks. I need to double-check.
> 
> > Paolo suggested to enable -numa node by default, I guess we can do it
> > once NUMA re-factoring is merged.
> > 
> > That said, I haven't found any information that Windows supports
> > memory hot-remove. Google tells that only hot-add is supported
> > for up to WS2008R2. I've tested WS2012R2, it doesn't work either,
> > i.e. it sees but ignores Notify request.
> > 
> > > 
> > > Regards,
> > > Hu Tao
> > > 


-- 
Regards,
  Igor



reply via email to

[Prev in Thread] Current Thread [Next in Thread]