octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ziggurat code


From: Allin Cottrell
Subject: Ziggurat code
Date: Sun, 9 Oct 2016 19:52:15 -0400 (EDT)
User-agent: Alpine 2.20.3 (LFD 70 2015-03-15)

I'm a developer of the GPL'd econometrics program gretl, and a co-worker and I were recently benchmarking our generation of random normals. We saw that Octave was noticeably faster at this, and I investigated. It transpired that the speed difference was down to David Bateman's very nice implementation of the Marsaglia-Tsang Ziggurat method.

We had our own implementation but David's was faster, without any sacrifice of quality (good coverage of the real line, passes the
l'Ecuyer "Crush" test), so we have, gratefully and with attribution,
"stolen" David's version for gretl.

Maybe I can give a little something back. There appears to a slight mix-up in the use of *X86_32 macros in the source file in question, namely randmtzig.cc. Near the top of this file there appears the following preprocessor stanza:

#if ! defined (USE_X86_32)
#  if defined (i386) || defined (HAVE_X86_32)
#    define USE_X86_32 1
#  else
#    define USE_X86_32 0
#  endif
#endif

Here the symbol HAVE_X86_32 is coming from "elsewhere" (I presume, config.h or the compiler command-line), and the symbol USE_X86_32 is set conditional on that plus the compiler symbol i386. One would suppose, therefore, that the code below should branch conditional on USE_X86_32, but that's not the case: USE_X86_32 is not referenced at all; everything below depends on HAVE_X86_32.

It seems that either (a) the code below should branch on USE_X86_32 rather than HAVE_X86_32, or (b) if HAVE_X86_32 is reckoned to carry all the relevant information, the aforementioned preprocessor code should be removed, for clarity.

Related, there's a comment which suggests some uncertainty, "Check whether -DUSE_X86_32=0 is faster on 64-bit x86 architectures." I can say something about that. I've tested the two variants of the Ziggurat code in randmtzig.cc on two 64-bit systems. In each case the integer RNG is version 1.4.1 of SFMT, using SSE2; the only difference is in the Ziggurat code.

The experiment involved generating 200 million random normals, repeated 5 times. In the tables below the columns headed "A" and "B" hold timings in seconds for the default 64-bit code-path and the x86 32-bit optimized code-path respectively. The columns to the right of A and B are elapsed-time indices relative to 100 = best on platform.

Fedora 23 64-bit, core i7-2600 @3.40GHz, gcc 5.3.1

  A      rel     B      rel
3.3994 100.00  3.9320 115.67
3.4058 100.19  3.8608 113.57
3.4045 100.15  3.9076 114.95
3.5408 100.16  3.9141 115.14
3.4064 100.21  3.8545 113.39

Arch 64-bit, core i7-920 @2.67GHz, gcc 6.2.1

  A      rel     B      rel
3.9583 100.00  4.3021 108.69
3.9867 100.72  4.3653 110.28
3.9781 100.50  4.3308 109.41
4.0371 101.99  4.3953 111.04
3.9899 100.80  4.4652 112.81

There's not a huge difference between the Ziggurat variants, but nonetheless it's clear, on this sample, that you don't want to use the 32-bit-optimized code on a 64-bit system. On the more recent
i7-2600 that incurs a time penalty of about 15 percent; on the older
i7-920 the penalty is around 10 percent.

--
Allin Cottrell
Department of Economics
Wake Forest University



reply via email to

[Prev in Thread] Current Thread [Next in Thread]