I don't think lmbench is intensive but it's sensitive to memory latency.
We'll measure kernel build time with minimum config, and post it later.
Here are some quick numbers of parallel kernel compile time.
The number of vcpu is 1, just for convenience.
time make -j 2 all
-----------------------------------------------------------------------------
Base: real 1m13.950s (user 1m2.742s, sys 0m10.446s)
Kemari: real 1m22.720s (user 1m5.882s, sys 0m10.882s)
time make -j 4 all
-----------------------------------------------------------------------------
Base: real 1m11.234s (user 1m2.582s, sys 0m8.643s)
Kemari: real 1m26.964s (user 1m6.530s, sys 0m12.194s)
The result of Kemari includes everything, meaning dirty pages tracking and
synchronization upon I/O operations to the disk.
The compile time using j=4 under Kemari was worse than that of j=2,
but I'm not sure this is due to dirty pages tracking or sync interval.