[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Randomly initializing a N-element vector
From: |
Dwight W. Read |
Subject: |
Re: Randomly initializing a N-element vector |
Date: |
Sun, 10 Oct 1999 15:40:51 +0100 |
REPLACE MY PREVIOUS EMAIL BY THIS ONE. THE PREVIOUS EMAIL USED
"n" IN TWO DIFFERENT WAYS AND THE SUGGESTED SAMPLING SCHEME IS
VALID BUT IMPRACTICAL. D. Read
Dave Koelle suggested that the sampling be done by selecting the
first coordinate, x1, of the vector from a uniform distribution over the
interval [0,1], then the second coordinate, x2, of the vector from a
uniform distribution over the interval [0, 1-x1], . . . , the n-1
coordinate from a uniform distribution over the interval [0,
1-x1-x2-...-xn-2], and finally set the nth coordinate xn = 1 -
(x1+x2...+xn-1).
This sampling scheme won't work for the problem as it was
posed.
Here's why.
(1) Let T be the space defined by the constraint, x1 + x2 + ... + xn
= 1 and let S be the space defined by the constraint x1 + x2 + ... + xn-1
<= 1. The space T is in 1-1 correspondance with the space
S under the mapping m : S -- > T defined by m((x1, x2, ...,
xn-1)) = (x1, x2, ..., xn-1, 1-(x1+...+xn-1)). Hence we can examine
the proposed sampling scheme by examing whether or not it samples the
space, S, uniformly.
(2) To see that proposed sampling scheme does not sample S
uniformly, suppose we set n = 3 and restrict the values of xi to the set
{0, 1/2, 1}. Then S consists of the 6 pairs (0,0), (0, 1/2), (0,
1), (1/2, 0), (1/2, 1/2) and (1,0). Our sampling scheme must select
each point in S with probablity 1/6 if the sampling scheme samples
S uniformly. However the propose sampling scheme has the following,
non-uniform probability distribution:
Pr[(0,0)] = Pr[(0,1/2)] = Pr[(0,1)] = 1/9
Pr[(1/2,0)= Pr[(1/2,1/2)] = 1/6
Pr[(1,0)] = 1/3.
More generally, if the xi are from the m-element set {0, 1/(m-1),
2/(m-1), ... , 1} then under the proposed sampling scheme, the probabiliy
of selecting a point, p, in S is given by: Pr[p] = (1/m)(1/(m-k) ), where
p = ( _ , k/(m-1)). Cleary this does not define a uniform
distribution over S. The space S has m(m+1)/2 points, so any
proposed sampling scheme must sample each point in S with probability
2/(m(m+1)) if the samping scheme is to sample the space T
uniformly.
The query about bias in the last coordinate towards small numbers
must first take into account that the last coordinate will be small in a
uniform distribution over S. In the example, if the 6 points were
equally probable, then the probability of the 2nd coodinate being 0 is
9/18, the probability of the 2nd coordinate being 1/2 is 6/18 and the
probabiliy that the 2nd coordiate is 1 is 3/18.
For the proposed sampling scheme, the probability that the 2nd
coordinate is 0 is 11/18, the probablity that the 2nd coordinate is
1/2 is 5/18, and the probability that the 2nd coordinate is 1 is
2/18, so there is a bias towards small numbers in the last coordinate
under the proposed sampling scheme over what would occur with a uniform
distribution over S.
More generally, with S having m points as discussed above, Pr[( _ ,
0)] = 2/(m+1) --> 0 for a uniform distribution over S, whereas
Pr[( _ , 0)] = 1/m[1/m + 1/(m-1) +...+1] for the proposed sampling
scheme and since the term in the [ ] scales with m, Pr[( _, 0)] will be
on the order of 1 as m increases for the proposed sampling
scheme.
Randomly rearranging the coordinates won't solve the non-uniformity
of the sampling scheme.
Dwight Read
Dwight W. Read, Professor
Department of Anthorpology and
Department of Statistics
University of California, Los Angeles
Visiting Professor
Department of Anthropology
University of Kent at Canterbury, Kent, UK
Email: address@hidden
or address@hidden
Office Phone: (310) 825-3988
FAX: (310) 556-0703