Re: Determining if samples are normal

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Determining if samples are normal

From:	Henry F. Mollet
Subject:	Re: Determining if samples are normal
Date:	Tue, 27 Sep 2005 10:43:17 -0700
User-agent:	Microsoft-Entourage/11.1.0.040913

Thanks. The actual data (e.g. randn(100,1) needs to be used for test.
Somehow I got confused and thought I had to supply the cdf for the test.

Going back to the original question and quoting from a manual: A histogram
of the data (corresponding to the pdf) is a rather poor method of
determining whether data look normally distributed. More powerful is a plot
of the values of the variable against corresponding percentages of a
standard normal variable (Gnanadesikan 1977). If the data is normally
distributed, one should get a straight line.

Then, I guess even more powerful, are all the test mentioned, including
anderson_darling_test
Henry


on 9/26/05 4:48 PM, Paul Kienzle at address@hidden wrote:

> Anderson-Darling returns small q if a sample is unlikely to have been
> drawn from the given distribution.
> 
> octave> anderson_darling_test (randn(100,1),'normal')
> ans = 1
> octave> anderson_darling_test (rand(100,1),'normal')
> ans = 0.010000
> 
> 
> So this is consistent with randn returning normally distributed
> numbers, but rand almost certainly does not.  Note that the test does
> not tell you that the numbers come from a normal process.  They may for
> example be correlated in the sequence, or they may be too regular.
> 
> 
> Checking the anderson-darling test 10000 times for a sample size of 100
> I get the following results:
> 
> octave:24>  c = anderson_darling_test(randn(100,10000),'normal');
> octave:25>  tabulate(100*c,100*[unique(c),1]);
>       bin     Fa       Fr%        Fc
>         1     83      0.83%       83
>       2.5    163      1.63%      246
>         5    273      2.73%      519
>        10    506      5.06%     1025
>       100   8975     89.75%    10000
> 
> The Fc column is cumulative frequency so divide by 100 to get percent.
> About 0.8% of the examples return q<=0.01, 2.5% return q <= 0.025, 5.2%
> return q <= 0.05, 10.3% return q <= 0.1.  Some samples drawn from a
> normal distribution will not look very normal, and no data driven test
> is going to be able to identify them as such.
> 
> Trying with a uniform distribution, again of sample size 100:
> 
> octave>  c = anderson_darling_test(rand(100,10000),'normal');
> octave>  tabulate(100*c,100*[unique(c),1]);
>       bin     Fa       Fr%        Fc
>         1   7907     79.07%     7907
>       2.5   1097     10.97%     9004
>         5    507      5.07%     9511
>        10    321      3.21%     9832
>       100    168      1.68%    10000
> 
> Most samples of size 100 from a uniform distributions will be rejected
> as not normal with 97.5% confidence by the Anderson-Darling test.
> 
> Triangular distributions look much too normal.  We will not be able to
> distinguish them with the Anderson-Darling test:
> 
> octave:26>  c = 
> anderson_darling_test(rand(100,10000)+rand(100,10000),'normal');
> octave:27>  tabulate(100*c,100*[unique(c),1]);     bin     Fa       Fr%
>         Fc
>         1    137      1.37%      137
>       2.5    202      2.02%      339
>         5    407      4.07%      746
>        10    757      7.57%     1503
>       100   8497     84.97%    10000
> 
> Using n=400, 60% of the triangular samples are rejected at a .1 level,
> but of course 10% of the normal samples are as well.
> 
> - Paul
> 
> On Sep 26, 2005, at 5:37 PM, Henry F. Mollet wrote:
> 
>> Am I doing this correctly? Anderson_darling_test of cdf normal and cdf
>> logistic give the same result (0.01). Graphs shows that they are
>> different.
>> Henry
>> 
>> octave:21> x=linspace (-4,4,100);
>> octave:22> mu=0.0; sigma = 1.0;
>> octave:23> cnormalx=0.5+0.5.*erf((x-mu)./sigma./sqrt(2));
>> octave:24> plot (x,cnormalx,"x")
>> octave:25> anderson_darling_test(cnormalx,'normal')
>> ans = 0.010000
>> octave:26> a=0; lambda=2;
>> octave:27> clogisticx=1.0./(1.0+exp(-lambda.*(x-a)));
>> octave:28> hold on
>> octave:30> plot (x,clogisticx,"@33")
>> octave:31> anderson_darling_test(clogisticx,'normal')
>> ans = 0.010000
>> 
>> 
>> 
>> on 9/26/05 12:55 AM, Paul Kienzle at address@hidden
>> wrote:
>> 
>>> The Anderson-Darling test is claimed to be pretty good:
>>> 
>>> http://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm
>>> http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm
>>> 
>>> Here's the relevant section from the R manual:
>>> 
>>> http://www.maths.lth.se/help/R/.R/library/nortest/html/ad.test.html
>>> 
>>>> The Anderson-Darling test is an EDF omnibus test for the composite
>>>> hypothesis of normality. The test statistic is
>>>> 
>>>> A^2 = -n -frac{1}{n} sum_{i=1}^{n} [2i-1] [ln(p_{(i)}) + ln(1 -
>>>> p_{(n-i+1)})],
>>>> 
>>>> where p_{(i)} = Phi([x_{(i)} - overline{x}]/s). Here, Phi is the
>>>> cumulative distribution function of the standard normal distribution,
>>>> and overline{x} and s are mean and standard deviation of the data
>>>> values. The p-value is computed from the modified statistic
>>>> 
>>>> Z=A^2 (1.0 + 0.75/n +2.25/n^{2})
>>>> 
>>>> according to Table 4.9 in Stephens (1986).
>>> 
>>> Here are the critical values I found elsewhere on the net.
>>> 
>>> 90% 0.631
>>> 95% 0.752
>>> 97.5% 0.873
>>> 99% 1.035
>>> 
>>> For example, if A^2 > 0.752 you can say that your data set is not
>>> normally distributed with 95% confidence.
>>> 
>>> This is implemented in octave-forge as 1-p =
>>> anderson_darling_test(x,'normal').  That is, if anderson_darling_test
>>> returns a value of 0.05 then you can say your data set is not normally
>>> distributed with 95% confidence.
>>> 
>>> - Paul
>>> 
>>> On Sep 25, 2005, at 1:59 PM, Søren Hauberg wrote:
>>> 
>>>> Hi,
>>>> Does anybody know how I can test wether or not some samples are
>>>> normaly distributed? I tried graphical methods, such as looking at
>>>> histograms and qqplots, but I don't trust my own judgement enough to
>>>> use graphical methods.
>>>> 
>>>> /Søren
>>> 
>>> 
>>> 
>>> 
>>> -------------------------------------------------------------
>>> Octave is freely available under the terms of the GNU GPL.
>>> 
>>> Octave's home on the web:  http://www.octave.org
>>> How to fund new projects:  http://www.octave.org/funding.html
>>> Subscription information:  http://www.octave.org/archive.html
>>> -------------------------------------------------------------
>>> 
>> 
>> 
> 





-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Determining if samples are normal, (continued)
- Re: Determining if samples are normal, Paul Kienzle, 2005/09/26
  - Re: Determining if samples are normal, Søren Hauberg, 2005/09/26
  - Re: Determining if samples are normal, Henry F. Mollet, 2005/09/26
    - Re: Determining if samples are normal, Paul Kienzle, 2005/09/26
    - Re: Determining if samples are normal, Mike Miller, 2005/09/26
    - Re: Determining if samples are normal, Paul Kienzle, 2005/09/26
    - Re: Determining if samples are normal, Mike Miller, 2005/09/26
    - Re: Determining if samples are normal, Henry F. Mollet <=
    - Re: Determining if samples are normal, Mike Miller, 2005/09/27

Prev by Date: Funding - Specifically identifying allocations
Next by Date: gplot
Previous by thread: Re: Determining if samples are normal
Next by thread: Re: Determining if samples are normal
Index(es):
- Date
- Thread