[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Ziggurat code
From: |
Allin Cottrell |
Subject: |
Ziggurat code |
Date: |
Sun, 9 Oct 2016 19:52:15 -0400 (EDT) |
User-agent: |
Alpine 2.20.3 (LFD 70 2015-03-15) |
I'm a developer of the GPL'd econometrics program gretl, and a
co-worker and I were recently benchmarking our generation of random
normals. We saw that Octave was noticeably faster at this, and I
investigated. It transpired that the speed difference was down to
David Bateman's very nice implementation of the Marsaglia-Tsang
Ziggurat method.
We had our own implementation but David's was faster, without any
sacrifice of quality (good coverage of the real line, passes the
l'Ecuyer "Crush" test), so we have, gratefully and with attribution,
"stolen" David's version for gretl.
Maybe I can give a little something back. There appears to a slight
mix-up in the use of *X86_32 macros in the source file in question,
namely randmtzig.cc. Near the top of this file there appears the
following preprocessor stanza:
#if ! defined (USE_X86_32)
# if defined (i386) || defined (HAVE_X86_32)
# define USE_X86_32 1
# else
# define USE_X86_32 0
# endif
#endif
Here the symbol HAVE_X86_32 is coming from "elsewhere" (I presume,
config.h or the compiler command-line), and the symbol USE_X86_32 is
set conditional on that plus the compiler symbol i386. One would
suppose, therefore, that the code below should branch conditional on
USE_X86_32, but that's not the case: USE_X86_32 is not referenced at
all; everything below depends on HAVE_X86_32.
It seems that either (a) the code below should branch on USE_X86_32
rather than HAVE_X86_32, or (b) if HAVE_X86_32 is reckoned to carry
all the relevant information, the aforementioned preprocessor code
should be removed, for clarity.
Related, there's a comment which suggests some uncertainty, "Check
whether -DUSE_X86_32=0 is faster on 64-bit x86 architectures." I can
say something about that. I've tested the two variants of the
Ziggurat code in randmtzig.cc on two 64-bit systems. In each case
the integer RNG is version 1.4.1 of SFMT, using SSE2; the only
difference is in the Ziggurat code.
The experiment involved generating 200 million random normals,
repeated 5 times. In the tables below the columns headed "A" and "B"
hold timings in seconds for the default 64-bit code-path and the
x86 32-bit optimized code-path respectively. The columns to the
right of A and B are elapsed-time indices relative to 100 = best on
platform.
Fedora 23 64-bit, core i7-2600 @3.40GHz, gcc 5.3.1
A rel B rel
3.3994 100.00 3.9320 115.67
3.4058 100.19 3.8608 113.57
3.4045 100.15 3.9076 114.95
3.5408 100.16 3.9141 115.14
3.4064 100.21 3.8545 113.39
Arch 64-bit, core i7-920 @2.67GHz, gcc 6.2.1
A rel B rel
3.9583 100.00 4.3021 108.69
3.9867 100.72 4.3653 110.28
3.9781 100.50 4.3308 109.41
4.0371 101.99 4.3953 111.04
3.9899 100.80 4.4652 112.81
There's not a huge difference between the Ziggurat variants, but
nonetheless it's clear, on this sample, that you don't want to use
the 32-bit-optimized code on a 64-bit system. On the more recent
i7-2600 that incurs a time penalty of about 15 percent; on the older
i7-920 the penalty is around 10 percent.
--
Allin Cottrell
Department of Economics
Wake Forest University
- Ziggurat code,
Allin Cottrell <=