qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation]


From: Blue Swirl
Subject: Re: [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation]
Date: Sat, 13 Oct 2007 14:58:30 +0300

On 10/13/07, J. Mayer <address@hidden> wrote:
> On Sat, 2007-10-13 at 11:57 +0200, J. Mayer wrote:
> > On Sat, 2007-10-13 at 10:11 +0300, Blue Swirl wrote:
> > > On 10/13/07, J. Mayer <address@hidden> wrote:
> > > > -------- Forwarded Message --------
> > > > > From: Jocelyn Mayer <address@hidden>
> > > > > Reply-To: address@hidden, address@hidden
> > > > > To: address@hidden
> > > > > Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
> > > > > Date: Fri, 12 Oct 2007 20:24:44 +0200
> > > > >
> > > > > On Fri, 2007-10-12 at 18:21 +0300, Blue Swirl wrote:
> > > > > > On 10/12/07, J. Mayer <address@hidden> wrote:
> > > > > > > Here's a small patch that allow an optimisation for code fetch, 
> > > > > > > at least
> > > > > > > for RISC CPU targets, as suggested by Fabrice Bellard.
> > > > > > > The main idea is that a translated block is never to span over a 
> > > > > > > page
> > > > > > > boundary. As the tb_find_slow routine already gets the physical 
> > > > > > > address
> > > > > > > of the page of code to be translated, the code translator could 
> > > > > > > then
> > > > > > > fetch the code using raw host memory accesses instead of doing it
> > > > > > > through the softmmu routines.
> > > > > > > This patch could also be adapted to RISC CPU targets, with care 
> > > > > > > for the
> > > > > > > last instruction of a page. For now, I did implement it for 
> > > > > > > alpha, arm,
> > > > > > > mips, PowerPC and SH4.
> > > > > > > I don't actually know if the optimsation would bring a sensible 
> > > > > > > speed
> > > > > > > gain or if it will be absolutelly marginal.
> > > > > > >
> > > > > > > Please comment.
> > > > > >
> > > > > > This will not work correctly for execution of MMIO registers, but
> > > > > > maybe that won't work on real hardware either. Who cares.
> > > > >
> > > > > I wonder if this is important or not... But maybe, when retrieving the
> > > > > physical address we could check if it is inside ROM/RAM or an I/O area
> > > > > and in the last case do not give the phys_addr information to the
> > > > > translator. In that case, it would go on using the ldxx_code. I guess 
> > > > > if
> > > > > we want to do that, a set of helpers would be appreciated to avoid
> > > > > adding code like:
> > > > > if (phys_pc == 0)
> > > > >   opc = ldul_code(virt_pc)
> > > > > else
> > > > >   opc = ldul_raw(phys_pc)
> > > > > everywhere... I could also add another check so this set of macro 
> > > > > would
> > > > > automatically use ldxx_code if we reach a page boundary, which would
> > > > > then make easy to use this optimisation for CISC/VLE architectures 
> > > > > too.
> > > > >
> > > > > I'm not sure of the proper solution to allow executing code from mmio
> > > > > devices. But adding specific accessors to handle the CISC/VLE case is 
> > > > > to
> > > > > be done.
> > > >
> > > > [...]
> > > >
> > > > I did update my patch following this way and it's now able to run x86
> > > > and PowerPC targets.
> > > > PowerPC is the easy case, x86 is maybe the worst... Well, I'm not really
> > > > sure of what I've done for Sparc, but other targets should be safe.
> > >
> > > It broke Sparc, delay slot handling makes things complicated. The
> > > updated patch passes my tests.
> >
> > OK. I will take a look of how you solved this issue.
> >
> > > For extra performance, I bypassed the ldl_code_p. On Sparc,
> > > instructions can't be split between two pages. Isn't translation
> > > always contained to the same page for all targets like Sparc?
> >
> > Yes, for RISC targets running 32 bits mode, we always stop translation
> > when we reach the end of a code page. The problem comes with CISC
> > architectures, like x86 or m68k, or RISC architecture running 16/32 bits
> > code, like ARM in thumb mode or PowerPC in VLE mode. In all those case,
> > there can be instructions spanning on 2 pages, then we need the
> > ldx_code_p functions.
> > My idea of always using the ldx_code_p function is that we may have the
> > occasion to make it more cleaver and make the slow case handle code
> > execution in mmio areas, when it will be possible.
>
> Here's an updated patch. I added a definition TARGET_HAS_VLE_INSNS which
> is defined is the cris, i386, m68k and ppcemb cases. Arm already has an
> explicit support for 32 bits thumb instructions spanning 2 pages, so it
> should not need this define. When this define is not set, the
> ldxxx_code_p function just does ldxxx_raw(phys_pc) in the softmmu case
> and ldxxx_raw(pc) in the user-mode only case. This is optimal for pure
> RISC architectures and does not need the #ifdef CONFIG_USER_ONLY you
> added for Sparc in your patch version. I also added a provision for a
> TARGET_MMIO_CODE define which may be used later when this will really be
> supported by Qemu.
> I also took your fixes for Sparc phys_pc computation, but reversed your
> patch to use ldl_raw as it should not be needed anymore.
> I did test PowerPC in user-mode only and softmmu mode and i386 in
> softmmu successfully using this new version of the patch.

OK  for Sparc.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]