Re: [Qemu-devel] dataplane performance on s390

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] dataplane performance on s390

From:	Karl Rister
Subject:	Re: [Qemu-devel] dataplane performance on s390
Date:	Tue, 10 Jun 2014 13:56:44 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 06/09/2014 08:40 PM, Fam Zheng wrote:

On Mon, 06/09 15:43, Karl Rister wrote:

Hi All

I was asked by our development team to do a performance sniff test of the
latest dataplane code on s390 and compare it against qemu.git.  Here is a
brief description of the configuration, the testing done, and then the
results.

Configuration:

Host: 26 CPU LPAR, 64GB, 8 zFCP adapters
Guest: 4 VCPU, 1GB, 128 virtio block devices

Each virtio block device maps to a dm-multipath device in the host with 8
paths.  Multipath is configured with the service-time policy.  All block
devices are configured to use the deadline IO scheduler.

Test:

FIO is used to run 4 scenarios: sequential read, sequential write, random
read, and random write.  Sequential scenarios use a 128KB request size and
random scenarios us a 8KB request size.  Each scenario is run with an
increasing number of jobs, from 1 to 128 (powers of 2).  Each job is bound
to an individual file on an ext3 file system on a virtio device and uses
O_DIRECT, libaio, and iodepth=1.  Each test is run three times for 2 minutes
each, the first iteration (a warmup) is thrown out and the next two
iterations are averaged together.

Results:

Baseline: qemu.git 93f94f9018229f146ed6bbe9e5ff72d67e4bd7ab

Dataplane: bdrv_set_aio_context 0ab50cde71aa27f39b8a3ea4766ff82671adb2a4


Hi Karl,

Thanks for the results.

The throughput differences look minimal, where is the bandwidth saturated in
these tests?  And why use iodepth=1, not more?


Hi Fam

Based on previously collected data, the configuration is hittingsaturation at the following points:


Sequential Read: 128 jobs
Sequential Write: 32 jobs
Random Read: 64 jobs
Random Write: saturation not reached

The iodepth=1 configuration is a somewhat arbitrary choice that is onlylimited by machine run time, I could certainly run higher loads and attimes I do.


Thanks.

Karl


Thanks,
Fam


Sequential Read:

Overall a slight throughput regression with a noticeable reduction in CPU
efficiency.

1 Job: Throughput regressed -1.4%, CPU improved -0.83%.
2 Job: Throughput regressed -2.5%, CPU regressed +2.81%
4 Job: Throughput regressed -2.2%, CPU regressed +12.22%
8 Job: Throughput regressed -0.7%, CPU regressed +9.77%
16 Job: Throughput regressed -3.4%, CPU regressed +7.04%
32 Job: Throughput regressed -1.8%, CPU regressed +12.03%
64 Job: Throughput regressed -0.1%, CPU regressed +10.60%
128 Job: Throughput increased +0.3%, CPU regressed +10.70%

Sequential Write:

Mostly regressed throughput, although it gets better as job count increases
and even has some gains at higher job counts.  CPU efficiency is regressed.

1 Job: Throughput regressed -1.9%, CPU regressed +0.90%
2 Job: Throughput regressed -2.0%, CPU regressed +1.07%
4 Job: Throughput regressed -2.4%, CPU regressed +8.68%
8 Job: Throughput regressed -2.0%, CPU regressed +4.23%
16 Job: Throughput regressed -5.0%, CPU regressed +10.53%
32 Job: Throughput improved +7.6%, CPU regressed +7.37%
64 Job: Throughput regressed -0.6%, CPU regressed +7.29%
128 Job: Throughput improved +8.3%, CPU regressed +6.68%

Random Read:

Again, mostly throughput regressions except for the largest job counts.  CPU
efficiency is regressed at all data points.

1 Job: Throughput regressed -3.0%, CPU regressed +0.14%
2 Job: Throughput regressed -3.6%, CPU regressed +6.86%
4 Job: Throughput regressed -5.1%, CPU regressed +11.11%
8 Job: Throughput regressed -8.6%, CPU regressed +12.32%
16 Job: Throughput regressed -5.7%, CPU regressed +12.99%
32 Job: Throughput regressed -7.4%, CPU regressed +7.62%
64 Job: Throughput improved +10.0%, CPU regressed +10.83%
128 Job: Throughput improved +10.7%, CPU regressed +10.85%

Random Write:

Throughput and CPU regressed at all but one data point.

1 Job: Throughput regressed -2.3%, CPU improved -1.50%
2 Job: Throughput regressed -2.2%, CPU regressed +0.16%
4 Job: Throughput regressed -1.0%, CPU regressed +8.36%
8 Job: Throughput regressed -8.6%, CPU regressed +12.47%
16 Job: Throughput regressed -3.1%, CPU regressed +12.40%
32 Job: Throughput regressed -0.2%, CPU regressed +11.59%
64 Job: Throughput regressed -1.9%, CPU regressed +12.65%
128 Job: Throughput improved +5.6%, CPU regressed +11.68%


* CPU consumption is an efficiency calculation of usage per MB of
throughput.

--
Karl Rister <address@hidden>
IBM Linux/KVM Development Optimization



--
Karl Rister <address@hidden>
IBM Linux/KVM Development Optimization

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] dataplane performance on s390, Karl Rister, 2014/06/09
- Re: [Qemu-devel] dataplane performance on s390, Fam Zheng, 2014/06/09
  - Re: [Qemu-devel] dataplane performance on s390, Karl Rister <=
    - Re: [Qemu-devel] dataplane performance on s390, Paolo Bonzini, 2014/06/10
  - Re: [Qemu-devel] dataplane performance on s390, Stefan Hajnoczi, 2014/06/19

Prev by Date: Re: [Qemu-devel] [PATCH v2 2/4] mirror: Go through ready -> complete process for 0 len image
Next by Date: Re: [Qemu-devel] [PATCH v2 4/4] qemu-iotests: Test 0-length image for mirror
Previous by thread: Re: [Qemu-devel] dataplane performance on s390
Next by thread: Re: [Qemu-devel] dataplane performance on s390
Index(es):
- Date
- Thread