[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EP
From: |
Babu Moger |
Subject: |
Re: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EPYC |
Date: |
Tue, 11 Aug 2020 16:03:58 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 8/7/20 2:11 PM, Igor Mammedov wrote:
> On Fri, 7 Aug 2020 17:52:22 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
>
>> On Fri, Aug 07, 2020 at 11:32:51AM -0500, Babu Moger wrote:
>>> Adding a new check to warn the users to configure 'dies' when
>>> topology is numa configured. It makes it easy to build the
>>> topology for EPYC models.
>>
>> This says you're adding a warning....
>>
>>>
>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>> ---
>>> hw/i386/x86.c | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>>> index 67bee1bcb8..2a6ce56ef1 100644
>>> --- a/hw/i386/x86.c
>>> +++ b/hw/i386/x86.c
>>> @@ -138,6 +138,13 @@ void x86_cpus_init(X86MachineState *x86ms, int
>>> default_cpu_version)
>>>
>>> /* Check for apicid encoding */
>>> if (cpu_x86_use_epyc_apic_id_encoding(ms->cpu_type)) {
>>> + if ((ms->numa_state->num_nodes > 0) &&
>>> + ms->numa_state->num_nodes != (ms->smp.sockets *
>>> x86ms->smp_dies)) {
>>> + error_setg(&error_fatal, "Numa configuration requires smp
>>> 'dies' "
>>> + "parameter. Configure the cpu topology properly
>>> with "
>>> + "max_cpus = sockets * dies * cores * threads");
>>
>> ...but you're actually making this a fatal error, not a warning.
>>
>> I'm not sure this is really OK. Wouldn't this mean that existing VMs
>> deployed today, risk triggering this fatal error next time they
>> are booted, or live migrated. If it is possible someone is using
>> such a config today, I don't think we can break it.
>
> to begin with, users shouldn't have used 'dies' with initial impl. at all.
> (it was Intel introduced option and EPYC's added very similar internal node_id
> (removed by the next patch)).
> Now we are trying to consolidate this mess and reuse dies for EPYC.
>
> EPYC was out in the since with 5.0 (though broken), users could start a VM
> with
> such config but that would not be correct EPYC from apicid and cpuid point of
> view.
> Guest OS might run if it doesn't know about EPYCs or behave wierdly (sub
> optimal|crash|whatever)
> on seeing unexpected values.
>
> If we are hell bound on keeping bugs of initial impl, then we should keep it
> to 5.1<=
> machine version and do the right thing for newer ones.
> Though I'm not sure we should keep broken variant around (all we would get
> from it is
> bug reports*/complains from users with end result of their config anyways).
> I'd rather error out with clear error message so user could fix their broken
> config.
>
> *) there is at least one thread/bz on qemu-devel where users are trying to run
> with EPYC and pick up options combination so it would produce sensible
> topology.
I am still not sure what is the right approach here. I can think of
couple of options.
1. If smp_dies != num_nodes then go ahead create the configuration with as
many smp_dies and warn(but not error out) users about the mis-configuration.
2. Introduce it as a fix based on machine version(5.1 >) like Igor
mentioned. I am not sure how to achieve that. I can look into that.
Thanks
Babu
>
>
>> Regards,
>> Daniel
>
Re: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EPYC, Igor Mammedov, 2020/08/07