[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Randomly initializing a N-element vector
From: |
Dwight W. Read |
Subject: |
Re: Randomly initializing a N-element vector |
Date: |
Sun, 10 Oct 1999 15:18:00 +0100 |
Dave Koelle suggested that the sampling be done by selecting
the first coordinate, x1, of the vector from a uniform distribution over
the interval [0,1], then the second coordinate, x2, of the vector from a
uniform distribution over the interval [0, 1-x1], . . . , the n-1
coordinate from a uniform distribution over the interval [0,
1-x1-x2-...-xn-2], and finally set the nth coordinate xn = 1 -
(x1+x2...+xn-1).
This sampling scheme won't work for the problem as it was
posed.
Here's why.
(1) Let T be the space defined by the constraint, x1 + x2 + ... + xn
= 1 and let S be the space defined by the constraint x1 + x2 + ... + xn-1
<= 1. The space T is in 1-1 correspondance with the space
S under the mapping m : S -- > T defined by m((x1, x2, ...,
xn-1)) = (x1, x2, ..., xn-1, 1-(x1+...+xn-1)). Hence we can examine
the proposed sampling scheme by examing whether or not it samples the
space, S, uniformly.
(2) To see that proposed sampling scheme does not sample S
uniformly, suppose we set n = 3 and restrict the values of xi to the set
{0, 1/2, 1}. Then S consists of the 6 pairs (0,0), (0, 1/2), (0,
1), (1/2, 0), (1/2, 1/2) and (1,0). Our sampling scheme must select
each point in S with probablity 1/6 if the sampling scheme samples
S uniformly. However the propose sampling scheme has the following,
non-uniform probability distribution:
Pr[(0,0)] = Pr[(0,1/2)] = Pr[(0,1)] = 1/9
Pr[(1/2,0)= Pr[(1/2,1/2)] = 1/6
Pr[(1,0)] = 1/3.
More generally, if the xi are from the n-element set {0, 1/(n-1),
2/(n-1), ... , 1} then under the proposed sampling scheme, the
probabiliyt of selecting a point, p, in S is given by: Pr[p] =
(1/n)(1/(n-m) ), where p = ( _ , m/(n-1)). Cleary this does not
define a uniform distribution over S. The space S has n(n+1)/2
points, so any proposed sampling scheme must sample each point in S with
probability 2/(n(n+1)) if the samping scheme is to sample the space
T uniformly.
The query about bias in the last coordinate towards small numbers
must first take into account that the last coordinate will be small in a
uniform distribution over S. In the example, if the 6 points were
equally probable, then the probability of the 2nd coodinate being 0 is
9/18, the probability of the 2nd coordinate being 1/2 is 6/18 and the
probabiliy that the 2nd coordiate is 1 is 3/18.
For the proposed sampling scheme, the probability that the 2nd
coordinate is 0 is 11/18, the probablity that the 2nd coordinate is
1/2 is 5/18, and the probability that the 2nd coordinate is 1 is
2/18, so there is a bias towards small numbers in the last coordinate
under the proposed sampling scheme over what would occur with a uniform
distribution over S.
More generally, with S having n points as discussed above, Pr[( _ ,
0)] = 2/(n+1) --> 0 for a uniform distribution over S, whereas
Pr[( _ , 0)] = 1/n[1/n + 1/(n-1) +...+1] for the proposed sampling
scheme and since the term in the [ ] scales with n, Pr[( _, 0)] will be
on the order of 1 as n increases for the proposed sampling
scheme.
Randomly rearranging the coordinates won't solve the non-uniformity
of the sampling scheme.
Let me sketch one way to do the sampling. We only need to
sample the space S, uniformly. The space S has n(n+1)/2
points. Basically, we need a 1-1 mapping from S to the
integers in the interval [1, 1(n+1)/2]; then we can sample this interval
uniformly and use the inverse of the mapping to determine the point in S
that corresponds to the point that has been sampled. Then we can
construct the corresponing point in T once the point in S has been
selected.
One way to get the mapping is to devise a rule for tracing through S
(this needs to be worked out and it shouldn't be difficult to set up a
rule in the form of a "for ..." loop; my proposed sampling
method doesn't depend on how the rule is specified). With this rule
in mind, now randomly sample an integer from the interval [1, n(n+1)/2]
and use the rule to identify the point in S that corresponds to the
selected interger. (At worst, the inverse of the mapping will have
to be computed; it might be able to come up with a rule whose inverse can
be calculated without tracing through the mapping.)
Dwight Read
Dwight W. Read, Professor
Department of Anthorpology and
Department of Statistics
University of California, Los Angeles
Visiting Professor
Department of Anthropology
University of Kent at Canterbury, Kent, UK
Email: address@hidden
or address@hidden
Office Phone: (310) 825-3988
FAX: (310) 556-0703