Re: [Qemu-devel] Re: KQEMU code organization

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: KQEMU code organization

From:	Fabrice Bellard
Subject:	Re: [Qemu-devel] Re: KQEMU code organization
Date:	Mon, 02 Jun 2008 11:02:09 +0200
User-agent:	Thunderbird 2.0.0.5 (X11/20070727)

Anthony Liguori wrote:

Fabrice Bellard wrote:
Anthony Liguori wrote:
[...]
FWIW, the l1_phys_map table is a current hurdle in gettingperformance. When we use proper accessors to access the virtio_ring,we end up taking
a significant performance hit (around 20% on iperf).  I have some simple
patches that implement a page_desc cache that cache the RAM regions in a
linear array.  That helps get most of it back.

I'd really like to remove the l1_phys_map entirely and replace it with a
sorted list of regions.  I think this would have an overall performance
improvement since its much more cache friendly.  One thing keeping this
from happening is the fact that the data structure is passed up to the
kernel for kqemu. Eliminating that dependency would be a very goodthing!
If the l1_phys_map is a performance bottleneck it means that the
internals of QEMU are not properly used. In QEMU/kqemu, it is not
accessed to do I/Os : a cache is used thru tlb_table[]. I don't see why
KVM cannot use a similar system.
This is for device emulation. KVM doesn't use l1_phys_map() for thingslike shadow page table accesses.
In the device emulation, we're currently using stl_phys() and friends.This goes through a full lookup in l1_phys_map.
Looking at other devices, some use phys_ram_base + PA and stl_raw()which is broken but faster. A few places callcpu_get_physical_page_desc(), then use phys_ram_base and stl_raw().This is okay but it still requires at least one l1_phys_map lookup peroperation in the device (packet receive, io notification, etc.). Idon't think that's going to help much because in our fast paths, we'reonly doing 2 or 3 stl_phys() operations.
At least on x86, there are very few regions of RAM. That makes it veryeasy to cache. A TLB style cache seems wrong to me because there are sofew RAM regions. I don't see a better way to do this with the existingAPIs.


I see your point. st/ldx_phys() were never optimized in fact.

A first solution would be to use a cache similar to the TLBs. It has theadvantage is being quite generic and fast. Another solution would be tocompute a few intervals with are tested before the generic case. Theseintervals would correspond to the main RAM area and would be updatedeach time a new device region is registered.

Does your remark implies that KVM switches back to the QEMU process foreach I/O ? If so, the l1_phys_map access time should be negligiblecompared to the SVM-VMX/kernel/user context switch !


Fabrice.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] Re: KQEMU code organization, Anthony Liguori, 2008/06/01
- Re: [Qemu-devel] Re: KQEMU code organization, Fabrice Bellard <=
  - Re: [Qemu-devel] Re: KQEMU code organization, Anthony Liguori, 2008/06/02
- Re: [Qemu-devel] Re: KQEMU code organization, Jamie Lokier, 2008/06/02

Prev by Date: Re: [Qemu-devel] [4605] SVM rework
Next by Date: [Qemu-devel] [4652] Fix type mismatch.
Previous by thread: Re: [Qemu-devel] Re: KQEMU code organization
Next by thread: Re: [Qemu-devel] Re: KQEMU code organization
Index(es):
- Date
- Thread