qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to spe


From: Aurelien Jarno
Subject: Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
Date: Thu, 30 Jul 2015 17:50:03 +0200
User-agent: Mutt/1.5.23 (2014-03-12)

On 2015-07-30 10:55, Aurelien Jarno wrote:
> On 2015-07-30 10:16, Dennis Luehring wrote:
> > Am 30.07.2015 um 09:52 schrieb Aurelien Jarno:
> > >On 2015-07-30 05:52, Dennis Luehring wrote:
> > >> Am 29.07.2015 um 17:01 schrieb Aurelien Jarno:
> > >> >The point is that emulation has a cost, and it's quite difficult to
> > >> >to lower it and thus improve the emulation speed.
> > >>
> > >> so its just not strange for you to see an 1/100...200 of the native x64
> > >> speed under qemu/SPARC64
> > >> i hoped that someone will jump up an shout "its impossible - it needs to 
> > >> be
> > >> a bug" ...sadly not
> > >
> > >Overall the ratio is more around 10, but in some specific cases where
> > >the TB cache is inefficient and TB can't be linked or with an
> > >inefficient MMU, a ratio of 100 is possible.
> > 
> > 
> > sysbench (0.4.12) --num-threads=1 --test=cpu --cpu-max-prime=2000 run
> >    Host x64    :   1.3580s
> >    Qemu SPARC64: 184.2532s
> > 
> > sysbench shows nearly ration of 200
> 
> Note that when you say SPARC64 here, it's actually only the kernel, you
> are using a 32-bit userland. And that makes a difference. Here are my
> tests here:
> 
> host (x86-64)                    0.8976s
> sparc32 guest (sparc64 kernel)  99.6116s
> sparc64 guest (sparc64 kernel)   4.4908s
> 
> So it looks like the 32-bit code is not QEMU friendly. I haven't looked
> at it yet, but I guess it might be due to dynamic jumps, so that TB
> can't be chained.

This is the corresponding C code from sysbench, which is ran 10000
times.

| int cpu_execute_request(sb_request_t *r, int thread_id)
| { 
|   unsigned long long c;
|   unsigned long long l,t;
|   unsigned long long n=0;
|   log_msg_t           msg;
|   log_msg_oper_t      op_msg;
|   
|   (void)r; /* unused */
|   
|   /* Prepare log message */
|   msg.type = LOG_MSG_TYPE_OPER;
|   msg.data = &op_msg;
|   
|   /* So far we're using very simple test prime number tests in 64bit */
|   LOG_EVENT_START(msg, thread_id);
|   
|   for(c=3; c < max_prime; c++)
|   { 
|     t = sqrt(c); 
|     for(l = 2; l <= t; l++)
|       if (c % l == 0)
|         break;
|     if (l > t )
|       n++;
|   }
|   
|   LOG_EVENT_STOP(msg, thread_id);
|   
|   return 0;
| }

This is a very simple test, which is probably not a good representation
of the CPU performances, even more when emulated by QEMU. In addition to
that, given it mostly uses 64 bit integer, it's kind of expected that
the 32-bit version is slower.

Anyway I have extracted this code into a C file (see attached file) that
can more easily compiled to 32 or 64 bit using -m32 or -m64. I observe
the same behavior than sysbench, even with qemu-user (which is not
surprising as the above code doesn't really put pressure the MMU.

Running it in I get the following time:
x86-64 host       0.877s
sparc guest -m32  1m39s
sparc guest -m64   3.5s
opensparc T1 -m32 1m59s
opensparc T1 -m64 1m12s

So overall QEMU is faster than a not so old real hardware. That said
looking at it quickly it seems that some of the FP instructions are
actually trapped and emulated by the kernel on the opensparc T1.

Now coming back to the QEMU problem, the issue is that the 64-bit code
is using the udivx instruction to compute the modulo, while the 32-bit
code calls the __umoddi3 GCC helper. It uses a lot of integer functions
based on CPU flags, so most of the time is spent computing them in
helper_compute_psr.

So as said in my previous emails, QEMU is not cycle accurate, and you
can expect that some specific code can be emulated very quickly (x4
ratio in the 64-bit case) and some other specific code can be emulated
very slowly (x110 ratio in the 32-bit case). It appears that the
sysbench code is actually quite specific, which explains the
difference.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
address@hidden                 http://www.aurel32.net

Attachment: prime.c
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]