octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

integer arithmetics


From: Jaroslav Hajek
Subject: integer arithmetics
Date: Mon, 22 Sep 2008 08:55:41 +0200

hi all,

this is the first part of my aim to extend & improve performance of
integer arithmetics in Octave. This patch does not yet implement int64
operations (though it prepares the ground). Indeed, the only files
modified by this patch are oct-inttypes.h and oct-inttypes.cc. Apart
from the octave_int<T> class, the design is mostly rewritten. Due to
the more specialized code employed, using C++ integer arithmetics
where possible, integer arithmetics from Octave ends up
significantly faster.

Attached is what I got using "benchmark_intmath" from "benchmarks"
Octave-Forge package, averaged over 5 runs on a non-loaded node of our
cluster, a 2x dual core AMD Opteron machine on 2.5 GHz with 8 GB RAM,
Octave compiled with Intel C++ 10.1 at -O2.

Btw. Speed-up is measured as (old-time/new-time - 1)*100%.

benchmark results: (averages over 5 runs)

array size (n) = 2e+07
ratio of intmath for generating integers (ratio) = 0.51

time to convert real vector to uint8:
 3.1.51+: 1.395368, new patch: 0.944810, speed-up: 47.7%
time to add two uint8 vectors:
 3.1.51+: 0.627428, new patch: 0.121169, speed-up: 417.8%
time to subtract two uint8 vectors:
 3.1.51+: 0.717337, new patch: 0.209019, speed-up: 243.2%
time to multiply two uint8 vectors:
 3.1.51+: 0.613577, new patch: 0.148653, speed-up: 312.8%
time to divide two uint8 vectors:
 3.1.51+: 1.326600, new patch: 0.527401, speed-up: 151.5%
time to convert real vector to int8:
 3.1.51+: 1.401196, new patch: 0.827683, speed-up: 69.3%
time to add two int8 vectors:
 3.1.51+: 0.593598, new patch: 0.128647, speed-up: 361.4%
time to subtract two int8 vectors:
 3.1.51+: 0.602559, new patch: 0.135151, speed-up: 345.8%
time to multiply two int8 vectors:
 3.1.51+: 0.711905, new patch: 0.244027, speed-up: 191.7%
time to divide two int8 vectors:
 3.1.51+: 1.406900, new patch: 0.609731, speed-up: 130.7%
time to convert real vector to uint16:
 3.1.51+: 1.442472, new patch: 0.839658, speed-up: 71.8%
time to add two uint16 vectors:
 3.1.51+: 0.651813, new patch: 0.142316, speed-up: 358.0%
time to subtract two uint16 vectors:
 3.1.51+: 0.753230, new patch: 0.230212, speed-up: 227.2%
time to multiply two uint16 vectors:
 3.1.51+: 0.603686, new patch: 0.148981, speed-up: 305.2%
time to divide two uint16 vectors:
 3.1.51+: 1.418092, new patch: 0.534673, speed-up: 165.2%
time to convert real vector to int16:
 3.1.51+: 1.383319, new patch: 0.859507, speed-up: 60.9%
time to add two int16 vectors:
 3.1.51+: 0.628046, new patch: 0.148982, speed-up: 321.6%
time to subtract two int16 vectors:
 3.1.51+: 0.615734, new patch: 0.154347, speed-up: 298.9%
time to multiply two int16 vectors:
 3.1.51+: 0.719454, new patch: 0.235359, speed-up: 205.7%
time to divide two int16 vectors:
 3.1.51+: 1.425676, new patch: 0.644366, speed-up: 121.3%
time to convert real vector to uint32:
 3.1.51+: 1.393243, new patch: 0.864667, speed-up: 61.1%
time to add two uint32 vectors:
 3.1.51+: 0.703566, new patch: 0.218528, speed-up: 222.0%
time to subtract two uint32 vectors:
 3.1.51+: 0.821517, new patch: 0.258432, speed-up: 217.9%
time to multiply two uint32 vectors:
 3.1.51+: 0.669506, new patch: 0.227623, speed-up: 194.1%
time to divide two uint32 vectors:
 3.1.51+: 1.518662, new patch: 0.544977, speed-up: 178.7%
time to convert real vector to int32:
 3.1.51+: 1.536289, new patch: 0.901578, speed-up: 70.4%
time to add two int32 vectors:
 3.1.51+: 0.556803, new patch: 0.221680, speed-up: 151.2%
time to subtract two int32 vectors:
 3.1.51+: 0.610358, new patch: 0.220673, speed-up: 176.6%
time to multiply two int32 vectors:
 3.1.51+: 0.690234, new patch: 0.285188, speed-up: 142.0%
time to divide two int32 vectors:
 3.1.51+: 1.371559, new patch: 0.663573, speed-up: 106.7%

 This patch should not, IMHO, be applied before 3.2.x is forked, so
I'm sharing mainly for comments.

Also, I would be interested if anyone reported performance
improvements on other systems.

I'd be also interested if anyone pointed out a machine where configure
detects that the fast integer operations cannot be used. (i.e.
HAVE_FAST_INT_OPS will be undefined).

patch download here:
http://artax.karlin.mff.cuni.cz/~hajej2am/ulozna/int_patch1.diff


regards

-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]