Ziggurat code

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ziggurat code

From:	Allin Cottrell
Subject:	Ziggurat code
Date:	Sun, 9 Oct 2016 19:52:15 -0400 (EDT)
User-agent:	Alpine 2.20.3 (LFD 70 2015-03-15)

I'm a developer of the GPL'd econometrics program gretl, and aco-worker and I were recently benchmarking our generation of randomnormals. We saw that Octave was noticeably faster at this, and Iinvestigated. It transpired that the speed difference was down toDavid Bateman's very nice implementation of the Marsaglia-TsangZiggurat method.

We had our own implementation but David's was faster, without anysacrifice of quality (good coverage of the real line, passes the

l'Ecuyer "Crush" test), so we have, gratefully and with attribution,
"stolen" David's version for gretl.

Maybe I can give a little something back. There appears to a slightmix-up in the use of *X86_32 macros in the source file in question,namely randmtzig.cc. Near the top of this file there appears thefollowing preprocessor stanza:


#if ! defined (USE_X86_32)
#  if defined (i386) || defined (HAVE_X86_32)
#    define USE_X86_32 1
#  else
#    define USE_X86_32 0
#  endif
#endif

Here the symbol HAVE_X86_32 is coming from "elsewhere" (I presume,config.h or the compiler command-line), and the symbol USE_X86_32 isset conditional on that plus the compiler symbol i386. One wouldsuppose, therefore, that the code below should branch conditional onUSE_X86_32, but that's not the case: USE_X86_32 is not referenced atall; everything below depends on HAVE_X86_32.

It seems that either (a) the code below should branch on USE_X86_32rather than HAVE_X86_32, or (b) if HAVE_X86_32 is reckoned to carryall the relevant information, the aforementioned preprocessor codeshould be removed, for clarity.

Related, there's a comment which suggests some uncertainty, "Checkwhether -DUSE_X86_32=0 is faster on 64-bit x86 architectures." I cansay something about that. I've tested the two variants of theZiggurat code in randmtzig.cc on two 64-bit systems. In each casethe integer RNG is version 1.4.1 of SFMT, using SSE2; the onlydifference is in the Ziggurat code.

The experiment involved generating 200 million random normals,repeated 5 times. In the tables below the columns headed "A" and "B"hold timings in seconds for the default 64-bit code-path and thex86 32-bit optimized code-path respectively. The columns to theright of A and B are elapsed-time indices relative to 100 = best onplatform.


Fedora 23 64-bit, core i7-2600 @3.40GHz, gcc 5.3.1

  A      rel     B      rel
3.3994 100.00  3.9320 115.67
3.4058 100.19  3.8608 113.57
3.4045 100.15  3.9076 114.95
3.5408 100.16  3.9141 115.14
3.4064 100.21  3.8545 113.39

Arch 64-bit, core i7-920 @2.67GHz, gcc 6.2.1

  A      rel     B      rel
3.9583 100.00  4.3021 108.69
3.9867 100.72  4.3653 110.28
3.9781 100.50  4.3308 109.41
4.0371 101.99  4.3953 111.04
3.9899 100.80  4.4652 112.81

There's not a huge difference between the Ziggurat variants, butnonetheless it's clear, on this sample, that you don't want to usethe 32-bit-optimized code on a 64-bit system. On the more recent

i7-2600 that incurs a time penalty of about 15 percent; on the older
i7-920 the penalty is around 10 percent.

--
Allin Cottrell
Department of Economics
Wake Forest University

[Prev in Thread]

Current Thread

[Next in Thread]

Ziggurat code, Allin Cottrell <=
- Re: Ziggurat code, siko1056, 2016/10/10
- Re: Ziggurat code, David Bateman, 2016/10/10

Prev by Date: Re: Failures in 'make check'
Next by Date: Re: Italian translation for the upcoming release of Octave 4.2
Previous by thread: Failures in 'make check'
Next by thread: Re: Ziggurat code
Index(es):
- Date
- Thread