[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell
From: |
Bruno Haible |
Subject: |
Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell |
Date: |
Sun, 21 Aug 2022 19:17:08 +0200 |
Paul Eggert asked:
> > Could you investigate further why mingw 64-bit fails?
Some words on the "why". You seem to have expectations regarding the
distribution quality of the resulting file names, but these expectations
are not warranted.
Donald E. Knuth wrote in TAOCP vol 2 "Seminumerical algorithms" that
an arbitrary sequence of number manipulating operations usually has a
bad quality as a pseudo-random number generator, and that in order
to get good quality pseudo-random numbers, one needs to use *specific*
pseudo-random number generators and then *prove* mathematical properties
of it. (The same is true, btw, for crypto algorithms.)
Then he started discussing linear congruential generators [1] as the
simplest example for which something could be proved.
The current source code of tempname.c generates the pseudo-random numbers
— assuming no HAS_CLOCK_ENTROPY and no ASLR — using a mix of three
operations:
(A) A linear congruential generator [2] with m = 2^64,
a = 2862933555777941757, c = 3037000493.
(B) A floor operation: v ← floor(v / 62^6)
(C) A xor operation: v ^= prev_v
There are three different questions:
(Q1) What is the expected quality inside a single gen_tempname call?
(Q2) What is the expected quality of multiple gen_tempname calls in a
single process?
(Q3) What is the expected quality when considering different processes
that call gen_tempname?
Answers:
(Q1) For just 6 'X'es, there is essentially a single (A) operation.
Therefore the quality will be good.
If someone uses more than 10 'X'es, for example, 50 'X'es, there
will be 5 (A) and 5 (B), interleaved: (A), (B), (A), (B), ...
This is *not* a linear congruential generator, therefore the
expected quality is BAD.
In order to fix this case, what I would do is to get back to
a linear congruential generator: (A), (A), ..., (A), (B).
In other words, feed into (A) exactly the output from the
previous (A). This means, do the statements
XXXXXX[i] = letters[v % 62];
v /= 62;
not on v itself, but on a copy of v.
But wait, there is also the desire to have predictability!
This means to not consume all the possible bits the
random_bits() call, but only part of it.
What I would do here, is to reduce BASE_62_DIGITS from 10 to 6.
So that in each round of the loop, 6 base-62 digits are consumed
and more than 4 base-62 digits are left in v, for predictability.
In the usual calls with 6 'X'es the loop will still end after a
single round.
(Q2) First of all, the multiple gen_tempname calls can occur in
different threads. Since no locking is involved, it is undefined
behaviour to access the 'prev_v' variable from different threads.
On machines with an IA-64 CPU, the 'prev_v' variable's value may
not be propagated from one thread to the other. [3][4][5]
The fix is simple, though: Mark 'prev_v' as 'volatile'.
Then, what the code does, is a mix of (A), (B), (C). Again, this
is *not* a linear congruential generator, therefore the expected
quality is BAD.
To get good quality, I would suggest to use a linear congruential
generator across *all* gen_tempname calls of the entire thread.
This means:
- Move out the (B) invocations out, like explained above in (Q1).
- Remove the (C) code that you added last week.
- Store the v value in a per-thread variable. Using '__thread'
on glibc systems and a TLS variable (#include "glthread/tls.h")
on the other platforms.
(Q3) Here, to force differences between different processes, I would
suggest to use a fine-grained clock value. In terms of platforms,
#if defined CLOCK_MONOTONIC && HAVE_CLOCK_GETTIME
is way too restrictive.
How about
- using CLOCK_REALTIME when CLOCK_MONOTONIC is not available,
- using gettimeofday() as fallback, especially on native Windows.
If one does (Q3), then the suggestions for (Q2) (other than the 'volatile')
may not be needed.
Bruno
[1] https://en.wikipedia.org/wiki/Linear_congruential_generator
[2] https://en.wikipedia.org/wiki/Linear_congruential_generator#c_%E2%89%A0_0
[3] https://es.cs.uni-kl.de/publications/datarsg/Geor16.pdf
[4] https://db.in.tum.de/teaching/ws1718/dataprocessing/chapter3.pdf?lang=de
page 18
[5] https://os.inf.tu-dresden.de/Studium/DOS/SS2014/04-Memory-Consistency.pdf
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, (continued)
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Bruno Haible, 2022/08/16
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Eli Zaretskii, 2022/08/16
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Bruno Haible, 2022/08/16
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Eli Zaretskii, 2022/08/17
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Bruno Haible, 2022/08/16
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Paul Eggert, 2022/08/16
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Bruno Haible, 2022/08/21
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Eli Zaretskii, 2022/08/21
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell,
Bruno Haible <=
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Paul Eggert, 2022/08/22
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Bruno Haible, 2022/08/22
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Eli Zaretskii, 2022/08/23
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Bruno Haible, 2022/08/23
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Eli Zaretskii, 2022/08/23
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Bruno Haible, 2022/08/22
- Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell, Paul Eggert, 2022/08/25