help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kolmogorov-Smirnov test


From: Mike Miller
Subject: Re: Kolmogorov-Smirnov test
Date: Thu, 17 Nov 2005 13:10:12 -0600 (CST)

On Thu, 17 Nov 2005, Hamish Allan wrote:

I've recently been trying to use kolmogorov_smirnov_test_2() to compare two distributions. In order to get a feel for it, I thought I'd try comparing two normal distributions:

octave:1> a = randn(2000,1);
octave:2> b = randn(2000,1);
octave:3> p = kolmogorov_smirnov_test_2(a,b)
p = 0.90224
octave:4> b = randn(2000,1);
octave:5> p = kolmogorov_smirnov_test_2(a,b)
warning: in /opt/local/share/octave/2.1.71/m/statistics/tests/ kolmogorov_smirnov_test_2.m near line 80, column 5:
warning: cannot compute correct p-values with ties
p = 0.98453

[These numbers look okay to me, though the warning I don't understand (is the K-S test not used to compare e.g., distributions of school grades, A through F, for which there would be many ties?). But that's not my biggest problem:]

I'm surprised that there is a tie among 4000 randn() numbers. It's a little counterintuitive, I suppose, like the birthday problem. The RNG in your Octave is not producing as much variety of random numbers as I would have (incorrectly) guessed, but I don't think that's a serious problem for you. There are ways to get more variety of random numbers from, say, Mersenne Twister, which might be what you are using, but I'll leave that for another day.


octave:6> b = randn(2000,1);
octave:7> p = kolmogorov_smirnov_test_2(a,b)
p = 0.50849

[That's a pretty low p-value! Is the K-S test actually going to be useful if it can be so wrong about the normally-distributed datasets as large as 2,000 points? Let's try bumping it up to 20,000...]

I think you don't understand what K-S test is doing. It compares two samples to see if they come from the same distribution. The p-value is uniformly distributed on [0,1] if the distributions are really the same. Your distributions are the same, so any p-value is equally likely. The value of 0.5 is approximately the expected value, so seeing a 0.5 is a good thing, not an indication of a problem.

Mike



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]