swarm-modeling
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Randomly initializing a N-element vector


From: Dwight W. Read
Subject: Re: Randomly initializing a N-element vector
Date: Sun, 10 Oct 1999 15:18:00 +0100

Dave Koelle suggested that the sampling be done by selecting the first coordinate, x1, of the vector from a uniform distribution over the interval [0,1], then the second coordinate, x2, of the vector from a uniform distribution over the interval [0, 1-x1], . . . , the n-1 coordinate from a uniform distribution over the interval [0, 1-x1-x2-...-xn-2], and finally set the nth coordinate xn = 1 - (x1+x2...+xn-1).

This sampling scheme won't work for the problem as it was posed.

Here's why.

(1) Let T be the space defined by the constraint, x1 + x2 + ... + xn = 1 and let S be the space defined by the constraint x1 + x2 + ... + xn-1 <= 1.  The space T  is in 1-1 correspondance with the space S under the mapping  m : S -- > T defined by m((x1, x2, ..., xn-1)) = (x1, x2, ..., xn-1, 1-(x1+...+xn-1)).  Hence we can examine the proposed sampling scheme by examing whether or not it samples the space, S, uniformly.

(2) To see that proposed sampling scheme does not sample S uniformly, suppose we set n = 3 and restrict the values of xi to the set {0, 1/2, 1}.  Then S consists of the 6 pairs (0,0), (0, 1/2), (0, 1), (1/2, 0), (1/2, 1/2) and (1,0).  Our sampling scheme must select each point in S with probablity 1/6  if the sampling scheme samples S uniformly.  However the propose sampling scheme has the following, non-uniform probability distribution:
  Pr[(0,0)] = Pr[(0,1/2)] = Pr[(0,1)] = 1/9
  Pr[(1/2,0)=  Pr[(1/2,1/2)] = 1/6
  Pr[(1,0)] = 1/3.

More generally, if the xi are from the n-element set {0, 1/(n-1), 2/(n-1), ... , 1} then under the proposed sampling scheme, the probabiliyt of selecting a point, p, in S is given by: Pr[p] = (1/n)(1/(n-m) ), where p = ( _ , m/(n-1)).  Cleary this does not define a uniform distribution over S.  The space S has n(n+1)/2 points, so any proposed sampling scheme must sample each point in S with probability 2/(n(n+1))  if the samping scheme is to sample the space T uniformly.

The query about bias in the last coordinate towards small numbers must first take into account that the last coordinate will be small in a uniform distribution over S.  In the example, if the 6 points were equally probable, then the probability of the 2nd coodinate being 0 is 9/18, the probability of the 2nd coordinate being 1/2 is 6/18 and the probabiliy that the 2nd coordiate is 1 is 3/18.

For the proposed sampling scheme, the probability that the 2nd coordinate is 0 is 11/18, the probablity  that the 2nd coordinate is 1/2  is 5/18, and the probability that the 2nd coordinate is 1 is 2/18, so there is a bias towards small numbers in the last coordinate under the proposed sampling scheme over what would occur with a uniform distribution over S.

More generally, with S having n points as discussed above, Pr[( _ , 0)] = 2/(n+1)  --> 0 for a uniform distribution over S, whereas Pr[( _ , 0)] = 1/n[1/n + 1/(n-1)  +...+1] for the proposed sampling scheme and since the term in the [ ] scales with n, Pr[( _, 0)] will be on the order of 1 as n increases for the proposed sampling scheme.

Randomly rearranging the coordinates won't solve the non-uniformity of the sampling scheme.

Let me sketch one way to do the sampling.  We only need to sample the space S, uniformly.  The space S has n(n+1)/2 points.  Basically, we need a  1-1 mapping from S to the integers in the interval [1, 1(n+1)/2]; then we can sample this interval uniformly and use the inverse of the mapping to determine the point in S that corresponds to the point that has been sampled.  Then we can construct the corresponing point in T once the point in S has been selected. 

One way to get the mapping is to devise a rule for tracing through S (this needs to be worked out and it shouldn't be difficult to set up a rule in the form of a "for ..." loop; my proposed sampling method doesn't depend on how the rule is specified).  With this rule in mind, now randomly sample an integer from the interval [1, n(n+1)/2] and use the rule to identify the point in S that corresponds to the selected interger.  (At worst, the inverse of the mapping will have to be computed; it might be able to come up with a rule whose inverse can be calculated without tracing through the mapping.)

Dwight Read


Dwight W. Read, Professor
Department of Anthorpology and
Department of Statistics
University of California, Los Angeles
Visiting Professor
Department of Anthropology
University of Kent at Canterbury, Kent, UK
Email: address@hidden
or   address@hidden
Office Phone: (310) 825-3988
FAX: (310) 556-0703

reply via email to

[Prev in Thread] Current Thread [Next in Thread]