[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RPC overhead
From: |
Neal H. Walfield |
Subject: |
RPC overhead |
Date: |
Mon, 07 Jul 2008 17:00:58 +0200 |
User-agent: |
Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (Shijō) APEL/10.6 Emacs/21.4 (i486-pc-linux-gnu) MULE/5.0 (SAKAKI) |
I ran an application benchmark on Viengoos. Specifically, the
application is derived from the GCbench program. You can find it
here:
http://cvs.savannah.gnu.org/viewvc/hurd-l4/benchmarks/GCbench.c?root=hurd&view=log
The benchmark takes 239.4 seconds to complete. During this time, it
aggressively uses Viengoos' services. (Viengoos is implemented as a
user-level server running on top of Pistachio.) I disabled all other
threads so that the only two threads that were running were the
application's main thread and Viengoos' service thread. Thus,
whenever the application makes a call, it should never block.
In Viengoos, I used l4_system_clock to read out the time on the
receipt of a message. Just before Viengoos sends a reply, it again
reads the time and records the difference. This is recorded in a
per-method variable. The number of calls per method is also recorded.
In the application, I instrumented the RPC stubs to do the same: just
before l4_call is invoked, I call l4_system_clock. On return, I again
call l4_system_clock and save the difference in a per-method variable.
The number of calls per method is again recorded.
Below are the four most used system calls:
Time (ms) % Time us per call
User Kernel U K # Calls User Kernel delta
object discard 18,054 15,171 7% 6% 686,960 26.2 22.0 4.2
object alloc 730 567 0% 0% 91,123 8.0 6.2 1.8
cap copy 868 515 0% 0% 90,464 9.5 5.6 3.9
folio alloc 30 27 0% 0% 712 43.1 37.8 5.3
I'd expect the amount of time measured from user space minus the time
measured in Viengoos to correspond to the RPC overhead. On this
machine (which has an AMD 1.2 Ghz K7 Duron with a 64kb L2 cache)
ping-pong reports the following costs associated with Inter-AS IPC:
IPC ( 0 MRs): 627.01 cycles, 0.52us, 0.00 instrs
IPC ( 4 MRs): 660.87 cycles, 0.55us, 0.00 instrs
IPC ( 8 MRs): 670.11 cycles, 0.56us, 0.00 instrs
IPC (12 MRs): 678.08 cycles, 0.56us, 0.00 instrs
IPC (16 MRs): 675.67 cycles, 0.56us, 0.00 instrs
IPC (20 MRs): 683.11 cycles, 0.57us, 0.00 instrs
IPC (24 MRs): 691.04 cycles, 0.57us, 0.00 instrs
IPC (28 MRs): 697.73 cycles, 0.58us, 0.00 instrs
IPC (32 MRs): 697.39 cycles, 0.58us, 0.00 instrs
IPC (36 MRs): 701.98 cycles, 0.58us, 0.00 instrs
IPC (40 MRs): 714.57 cycles, 0.59us, 0.00 instrs
IPC (44 MRs): 718.00 cycles, 0.60us, 0.00 instrs
IPC (48 MRs): 720.20 cycles, 0.60us, 0.00 instrs
IPC (52 MRs): 729.10 cycles, 0.60us, 0.00 instrs
IPC (56 MRs): 736.47 cycles, 0.61us, 0.00 instrs
IPC (60 MRs): 733.48 cycles, 0.61us, 0.00 instrs
Each invocation includes approximately 12 words of payload and each
reply contains 2 words. This suggests an RPC overhead of 1350 cycles
or 1.2 us.
The 4.2 us represents approximately 5000 cycles. This leaves 3650
unaccounted cycles. This seems to be a bit more than one can simply
accounted to secondary cache effects, however, perhaps ping pong
really measures the very hot case and I'm running with very cold
caches. I hope someone else can suggest how to figure out to what end
these cycles are being put, has a theory, or can confirm that these
cycle counts are not, in fact, too high.
Thanks,
Neal
- RPC overhead,
Neal H. Walfield <=