[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Determining if samples are normal
From: |
Henry F. Mollet |
Subject: |
Re: Determining if samples are normal |
Date: |
Tue, 27 Sep 2005 10:43:17 -0700 |
User-agent: |
Microsoft-Entourage/11.1.0.040913 |
Thanks. The actual data (e.g. randn(100,1) needs to be used for test.
Somehow I got confused and thought I had to supply the cdf for the test.
Going back to the original question and quoting from a manual: A histogram
of the data (corresponding to the pdf) is a rather poor method of
determining whether data look normally distributed. More powerful is a plot
of the values of the variable against corresponding percentages of a
standard normal variable (Gnanadesikan 1977). If the data is normally
distributed, one should get a straight line.
Then, I guess even more powerful, are all the test mentioned, including
anderson_darling_test
Henry
on 9/26/05 4:48 PM, Paul Kienzle at address@hidden wrote:
> Anderson-Darling returns small q if a sample is unlikely to have been
> drawn from the given distribution.
>
> octave> anderson_darling_test (randn(100,1),'normal')
> ans = 1
> octave> anderson_darling_test (rand(100,1),'normal')
> ans = 0.010000
>
>
> So this is consistent with randn returning normally distributed
> numbers, but rand almost certainly does not. Note that the test does
> not tell you that the numbers come from a normal process. They may for
> example be correlated in the sequence, or they may be too regular.
>
>
> Checking the anderson-darling test 10000 times for a sample size of 100
> I get the following results:
>
> octave:24> c = anderson_darling_test(randn(100,10000),'normal');
> octave:25> tabulate(100*c,100*[unique(c),1]);
> bin Fa Fr% Fc
> 1 83 0.83% 83
> 2.5 163 1.63% 246
> 5 273 2.73% 519
> 10 506 5.06% 1025
> 100 8975 89.75% 10000
>
> The Fc column is cumulative frequency so divide by 100 to get percent.
> About 0.8% of the examples return q<=0.01, 2.5% return q <= 0.025, 5.2%
> return q <= 0.05, 10.3% return q <= 0.1. Some samples drawn from a
> normal distribution will not look very normal, and no data driven test
> is going to be able to identify them as such.
>
> Trying with a uniform distribution, again of sample size 100:
>
> octave> c = anderson_darling_test(rand(100,10000),'normal');
> octave> tabulate(100*c,100*[unique(c),1]);
> bin Fa Fr% Fc
> 1 7907 79.07% 7907
> 2.5 1097 10.97% 9004
> 5 507 5.07% 9511
> 10 321 3.21% 9832
> 100 168 1.68% 10000
>
> Most samples of size 100 from a uniform distributions will be rejected
> as not normal with 97.5% confidence by the Anderson-Darling test.
>
> Triangular distributions look much too normal. We will not be able to
> distinguish them with the Anderson-Darling test:
>
> octave:26> c =
> anderson_darling_test(rand(100,10000)+rand(100,10000),'normal');
> octave:27> tabulate(100*c,100*[unique(c),1]); bin Fa Fr%
> Fc
> 1 137 1.37% 137
> 2.5 202 2.02% 339
> 5 407 4.07% 746
> 10 757 7.57% 1503
> 100 8497 84.97% 10000
>
> Using n=400, 60% of the triangular samples are rejected at a .1 level,
> but of course 10% of the normal samples are as well.
>
> - Paul
>
> On Sep 26, 2005, at 5:37 PM, Henry F. Mollet wrote:
>
>> Am I doing this correctly? Anderson_darling_test of cdf normal and cdf
>> logistic give the same result (0.01). Graphs shows that they are
>> different.
>> Henry
>>
>> octave:21> x=linspace (-4,4,100);
>> octave:22> mu=0.0; sigma = 1.0;
>> octave:23> cnormalx=0.5+0.5.*erf((x-mu)./sigma./sqrt(2));
>> octave:24> plot (x,cnormalx,"x")
>> octave:25> anderson_darling_test(cnormalx,'normal')
>> ans = 0.010000
>> octave:26> a=0; lambda=2;
>> octave:27> clogisticx=1.0./(1.0+exp(-lambda.*(x-a)));
>> octave:28> hold on
>> octave:30> plot (x,clogisticx,"@33")
>> octave:31> anderson_darling_test(clogisticx,'normal')
>> ans = 0.010000
>>
>>
>>
>> on 9/26/05 12:55 AM, Paul Kienzle at address@hidden
>> wrote:
>>
>>> The Anderson-Darling test is claimed to be pretty good:
>>>
>>> http://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm
>>> http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm
>>>
>>> Here's the relevant section from the R manual:
>>>
>>> http://www.maths.lth.se/help/R/.R/library/nortest/html/ad.test.html
>>>
>>>> The Anderson-Darling test is an EDF omnibus test for the composite
>>>> hypothesis of normality. The test statistic is
>>>>
>>>> A^2 = -n -frac{1}{n} sum_{i=1}^{n} [2i-1] [ln(p_{(i)}) + ln(1 -
>>>> p_{(n-i+1)})],
>>>>
>>>> where p_{(i)} = Phi([x_{(i)} - overline{x}]/s). Here, Phi is the
>>>> cumulative distribution function of the standard normal distribution,
>>>> and overline{x} and s are mean and standard deviation of the data
>>>> values. The p-value is computed from the modified statistic
>>>>
>>>> Z=A^2 (1.0 + 0.75/n +2.25/n^{2})
>>>>
>>>> according to Table 4.9 in Stephens (1986).
>>>
>>> Here are the critical values I found elsewhere on the net.
>>>
>>> 90% 0.631
>>> 95% 0.752
>>> 97.5% 0.873
>>> 99% 1.035
>>>
>>> For example, if A^2 > 0.752 you can say that your data set is not
>>> normally distributed with 95% confidence.
>>>
>>> This is implemented in octave-forge as 1-p =
>>> anderson_darling_test(x,'normal'). That is, if anderson_darling_test
>>> returns a value of 0.05 then you can say your data set is not normally
>>> distributed with 95% confidence.
>>>
>>> - Paul
>>>
>>> On Sep 25, 2005, at 1:59 PM, Søren Hauberg wrote:
>>>
>>>> Hi,
>>>> Does anybody know how I can test wether or not some samples are
>>>> normaly distributed? I tried graphical methods, such as looking at
>>>> histograms and qqplots, but I don't trust my own judgement enough to
>>>> use graphical methods.
>>>>
>>>> /Søren
>>>
>>>
>>>
>>>
>>> -------------------------------------------------------------
>>> Octave is freely available under the terms of the GNU GPL.
>>>
>>> Octave's home on the web: http://www.octave.org
>>> How to fund new projects: http://www.octave.org/funding.html
>>> Subscription information: http://www.octave.org/archive.html
>>> -------------------------------------------------------------
>>>
>>
>>
>
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
- Re: Determining if samples are normal, (continued)