[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Determining if samples are normal
From: |
Mike Miller |
Subject: |
Re: Determining if samples are normal |
Date: |
Mon, 26 Sep 2005 15:54:13 -0500 (CDT) |
On Mon, 26 Sep 2005, Przemek Klosowski wrote:
I said that I tried it for 10000 as well---I didn't give it as an
example because it almost blew my computer out of the water---both
octave and the 'less' pager used almost a gigabyte of memory because the
function as written printed out intermediate results.
Przemek is allowing me to send his note (above) to the list. He is quite
right and I apologize for responding only to his code for 100. If the
same result is obtained with 10000 as with 100, the test seems rather
lame.
Now that I've had a chance to look at it again, the problem is that the
original presentation of the test is not correct. What is needed are
critical values for the correlation coefficient based on truly normal data
at various sample sizes. These have been published. For example, see
Johnson and Wichern (1992) Applied Multivariate Statistics, p. 158, Table
4.2.
This is for samples of 100 from the triangular distribution:
normals=normal_inv(([1:100]'-.5)/100,0,1);
r=zeros(10000,1);
for i=1:10000, r(i)=corrcoef([normals,sort(rand(100,2)*[1;1])])(2,1); end
mean(r < .9895)
ans = 0.13340
mean(r < .9873)
ans = 0.049400
mean(r < .9822)
ans = 0.0033000
The values .9895, .9873 and .9822 are the critical points for .10, .05 and
.01 significance levels. The test has essentially no power here. Fewer
than 5% exceeded the 5% alpha threshold. But with samples of 300, we
start to see differences:
normals=normal_inv(([1:300]'-.5)/300,0,1);
for i=1:10000, r(i)=corrcoef([normals,sort(rand(300,2)*[1;1])])(2,1); end
mean(r < .9960)
ans = 0.68390
octave:39> mean(r < .9953)
ans = 0.48890
octave:40> mean(r < .9935)
ans = 0.12140
So 49% are statistically significant at the .05 level.
Obviously, with 10,000 observations we would have much greater power than
49% (and 49% isn't all that impressive, but a normal curve and a triangle
aren't all that different!). Unfortunately, I don't know what the proper
critical points are for the distribution of the correlation with N of
10,000, so I'm not going to do the test.
I don't understand the Doug Stewart approach to this. I'm sure my way
(implied in the above code) is much quicker and easier.
Mike
--
Michael B. Miller, Ph.D.
Assistant Professor
Division of Epidemiology and Community Health
and Institute of Human Genetics
University of Minnesota
http://taxa.epi.umn.edu/~mbmiller/
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
- Re: Determining if samples are normal, (continued)
- Re: Determining if samples are normal, Mike Miller, 2005/09/26
- Re: Determining if samples are normal, Przemek Klosowski, 2005/09/26
- Re: Determining if samples are normal, Robert A. Macy, 2005/09/26
- Re: Determining if samples are normal, Mike Miller, 2005/09/26
- Re: Determining if samples are normal, Raymond E. Rogers, 2005/09/26
- Re: Determining if samples are normal, Mike Miller, 2005/09/27
- Re: Determining if samples are normal, Michael Creel, 2005/09/27
- Re: Determining if samples are normal, Paul Koufalas, 2005/09/27
- Re: Determining if samples are normal, Mike Miller, 2005/09/27
- Re: Determining if samples are normal, Mike Miller, 2005/09/26
- Message not available
- Re: Determining if samples are normal,
Mike Miller <=
Re: Determining if samples are normal, Joe Koski, 2005/09/25
Re: Determining if samples are normal, Paul Kienzle, 2005/09/26