qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: KQEMU code organization


From: Fabrice Bellard
Subject: Re: [Qemu-devel] Re: KQEMU code organization
Date: Mon, 02 Jun 2008 11:02:09 +0200
User-agent: Thunderbird 2.0.0.5 (X11/20070727)

Anthony Liguori wrote:
Fabrice Bellard wrote:
Anthony Liguori wrote:
[...]
FWIW, the l1_phys_map table is a current hurdle in getting performance. When we use proper accessors to access the virtio_ring, we end up taking
a significant performance hit (around 20% on iperf).  I have some simple
patches that implement a page_desc cache that cache the RAM regions in a
linear array.  That helps get most of it back.

I'd really like to remove the l1_phys_map entirely and replace it with a
sorted list of regions.  I think this would have an overall performance
improvement since its much more cache friendly.  One thing keeping this
from happening is the fact that the data structure is passed up to the
kernel for kqemu. Eliminating that dependency would be a very good thing!

If the l1_phys_map is a performance bottleneck it means that the
internals of QEMU are not properly used. In QEMU/kqemu, it is not
accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why
KVM cannot use a similar system.

This is for device emulation. KVM doesn't use l1_phys_map() for things like shadow page table accesses.

In the device emulation, we're currently using stl_phys() and friends. This goes through a full lookup in l1_phys_map.

Looking at other devices, some use phys_ram_base + PA and stl_raw() which is broken but faster. A few places call cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw(). This is okay but it still requires at least one l1_phys_map lookup per operation in the device (packet receive, io notification, etc.). I don't think that's going to help much because in our fast paths, we're only doing 2 or 3 stl_phys() operations.

At least on x86, there are very few regions of RAM. That makes it very easy to cache. A TLB style cache seems wrong to me because there are so few RAM regions. I don't see a better way to do this with the existing APIs.

I see your point. st/ldx_phys() were never optimized in fact.

A first solution would be to use a cache similar to the TLBs. It has the advantage is being quite generic and fast. Another solution would be to compute a few intervals with are tested before the generic case. These intervals would correspond to the main RAM area and would be updated each time a new device region is registered.

Does your remark implies that KVM switches back to the QEMU process for each I/O ? If so, the l1_phys_map access time should be negligible compared to the SVM-VMX/kernel/user context switch !

Fabrice.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]