This is for device emulation. KVM doesn't use l1_phys_map() for
things like shadow page table accesses.
In the device emulation, we're currently using stl_phys() and
friends. This goes through a full lookup in l1_phys_map.
Looking at other devices, some use phys_ram_base + PA and stl_raw()
which is broken but faster. A few places call
cpu_get_physical_page_desc(), then use phys_ram_base and stl_raw().
This is okay but it still requires at least one l1_phys_map lookup
per operation in the device (packet receive, io notification, etc.).
I don't think that's going to help much because in our fast paths,
we're only doing 2 or 3 stl_phys() operations.
At least on x86, there are very few regions of RAM. That makes it
very easy to cache. A TLB style cache seems wrong to me because
there are so few RAM regions. I don't see a better way to do this
with the existing APIs.
I see your point. st/ldx_phys() were never optimized in fact.
A first solution would be to use a cache similar to the TLBs. It has
the advantage is being quite generic and fast. Another solution would
be to compute a few intervals with are tested before the generic case.
These intervals would correspond to the main RAM area and would be
updated each time a new device region is registered.