qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] qmp: dump-guest-memory: -p option has issues, fix it or


From: Markus Armbruster
Subject: Re: [Qemu-devel] qmp: dump-guest-memory: -p option has issues, fix it or drop it?
Date: Tue, 18 Sep 2012 15:36:52 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux)

Jan Kiszka <address@hidden> writes:

> On 2012-09-18 14:23, Markus Armbruster wrote:
>> Jan Kiszka <address@hidden> writes:
>> 
>>> On 2012-09-18 03:52, Wen Congyang wrote:
>>>> At 09/18/2012 01:56 AM, Luiz Capitulino Wrote:
>>>>> Hi Wen,
>>>>>
>>>>> We've re-reviewed the dump-guest-memory command and found some
>>>>> possible issues with the -p option.
>>>>>
>>>>> The main issue is that it seems possible for a malicious guest to set
>>>>> page tables in a way that we allocate a MemoryMapping structure for
>>>>> each possible PTE. If IA-32e paging is used, this could lead to the
>>>>> allocation of dozens of gigabytes by qemu.
>>>>>
>>>>> Of course that this is not expected for the regular case, where a
>>>>> MemoryMapping allocation can be skipped for several reasons  (I/O memory,
>>>>> page not present, contiguous/in same range addresses etc), but the
>>>>> point is what a malicious guest can do.
>>>>>
>>>>> Another problem is that the -p option seems to be broken for SMP guests.
>>>>> The problem is in qemu_get_guest_memory_mapping():
>>>>>
>>>>>     first_paging_enabled_cpu = find_paging_enabled_cpu(first_cpu);
>>>>>     if (first_paging_enabled_cpu) {
>>>>>         for (env = first_paging_enabled_cpu; env != NULL; env = 
>>>>> env->next_cpu) {
>>>>>             ret = cpu_get_memory_mapping(list, env);
>>>>>             if (ret < 0) {
>>>>>                 return -1;
>>>>>             }
>>>>>         }
>>>>>         return 0;
>>>>>     }
>>>>>
>>>>> This looks for the first vCPU with paging enabled, and then assumes
>>>>> that all the following vCPUs also have paging enabled. How does this
>>>>> hold?
>>>
>>> cpu_get_memory_mapping re-validates that paging is one. In fact,
>>> cpu_get_memory_mapping should handle both cases so that the generic code
>>> need not worry about paging on/off.
>> 
>> The loop Luiz quoted is confusing.
>> 
>> Actually, the whole function is confusing.  Here's how I understand it:
>> 
>>     if there is a CPU that has paging enabled
>>         there is a proper prefix of env whose members don't have paging
>>         enabled; ignore them all [WTF#1]
>>         for all members of env not in that prefix ("the suffix"):
>>             get memory mapping for a CPU with paging enabled [WTF#2],
>>             and merge it into list
>>     else
>>         get memory mapping for ram_list,
>>         and merge it into list
>> 
>> WTF#1: Why is it okay to ignore leading CPUs with paging disabled, but
>> only if there's at least one CPU with paging enabled?
>> 
>> WTF#2: What if a CPU in the suffix doesn't have paging enabled?  Oh, the
>> arch-specific function to get its memory map is expected to do nothing
>> then.
>> 
>> Bonus WTF#3: What if a guest enables/disables paging between
>> find_paging_enabled_cpu() and the loop?  What if it changes page tables
>> while we walk them?
>
> In fact, the dump should be taken in a consistent state, means it should
> run synchronously /wrt at least the CPU we refer to. So we need to run
> the dump over the VCPU thread or with that VCPU stopped.

Makes sense.  Unfortunately, it's not what the code does.

>> WTF is this function supposed to do?
>
> Associate virtual and physical addresses for the whole machine at a
> given time. The picture is not fully consistent as we cannot express yet
> that different CPUs have different views on memory. IIRC, the first view
> is taken, the rests are dropped - correct me if I'm wrong, Wen.

As far as I can tell, the code merges the views of all CPUs that have
paging enabled, which makes no sense to me.

>>>>> Assuming that this last issue is fixable (ie. we can make the -p
>>>>> option work well with SMP guests), we should at least document that
>>>>> -p can make QEMU allocates lots of memory and end up being killed
>>>>> by the OS.
>>>>>
>>>>> However, I also think that we should consider if having the -p
>>>>> feature is really worth it. It's a complex feature and has a number
>>>>> of limitations*. If libvirt doesn't use this, dropping it shouldn't
>>>>> be a big deal (we can return an error when -p is used).
>>>
>>> libvirt should surely not be the only reference for debugging features.
>> 
>> No, it's just a user, albeit an important one.  We don't break known
>> users cavalierly.
>
> That is not what is being discussed here. It's about dropping a feature
> because that one user doesn't expose it.

Let's not get into that fruitless discussion again.  I'm sure Luiz
didn't mean to suggest that other users (including you) don't matter.

>>>>>  * The issues discussed in this email plus the fact that the guest
>>>>>    memory may be corrupted, and the guest may be in real-mode even
>>>>>    when paging is enabled
>>>>>
>>>>
>>>> Yes, there are some limitations with this option. Jan said that he
>>>> always use gdb to deal with vmcore, so he needs such information.
>>>
>>> The point is to overcome the focus on Linux-only dump processing tools.
>> 
>> While I don't care for supporting alternate dump processing tools
>> myself, I certainly don't mind supporting them, as long as the code
>> satisfies basic safety and reliability requirements.
>> 
>> This code doesn't, as far as I can tell.
>
> It works, thought not under all circumstances.

I don't doubt it works often enough to be useful to somebody.  But basic
safety and reliability requirements are a bit more than that.  They
include "don't explode in ways a reasonable user can't be expected to
foresee".  I don't think a reasonable user can be expected to see that
-p is safe only for trusted guests.

>> If that's correct, we should either rip it out until a satisfactory
>> replacemnt is available, or at least document -p as unsafe and
>> unreliable debugging feature (master & stable).
>> 
>>> I'm sure the memory allocation can be avoided by writing out any found
>>> virt->phys mapping directly to the vmcore file. We know where physical
>>> RAM will be, we only need the corresponding virtual addresses - IIUC. So
>>> first prepare the section according to the guest's RAM size and then,
>>> once we identified a page while walking the tables carefully, seek to
>>> that file position and write to it.
>> 
>> Sounds like a non-trivial change from the current code.  Makes me lean
>> towards ripping it out.
>
> We are in early 1.3 cycle. There is surely no need to rip something
> useful out based on these arguments.

It's an easy way to make sure we it gets fixed.  If you want to fix it
without first ripping it out, go right ahead.

1.2 is a separate issue.  Do you agree fixing it there isn't practical?
If yes, do you want to keep it anyway?  If yes, we need a suitable
documentation patch.  Both master and stable.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]