[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Kolmogorov-Smirnov test
From: |
Mike Miller |
Subject: |
Re: Kolmogorov-Smirnov test |
Date: |
Thu, 17 Nov 2005 13:10:12 -0600 (CST) |
On Thu, 17 Nov 2005, Hamish Allan wrote:
I've recently been trying to use kolmogorov_smirnov_test_2() to compare two
distributions. In order to get a feel for it, I thought I'd try comparing two
normal distributions:
octave:1> a = randn(2000,1);
octave:2> b = randn(2000,1);
octave:3> p = kolmogorov_smirnov_test_2(a,b)
p = 0.90224
octave:4> b = randn(2000,1);
octave:5> p = kolmogorov_smirnov_test_2(a,b)
warning: in /opt/local/share/octave/2.1.71/m/statistics/tests/
kolmogorov_smirnov_test_2.m near line 80, column 5:
warning: cannot compute correct p-values with ties
p = 0.98453
[These numbers look okay to me, though the warning I don't understand (is the
K-S test not used to compare e.g., distributions of school grades, A through
F, for which there would be many ties?). But that's not my biggest problem:]
I'm surprised that there is a tie among 4000 randn() numbers. It's a
little counterintuitive, I suppose, like the birthday problem. The RNG in
your Octave is not producing as much variety of random numbers as I would
have (incorrectly) guessed, but I don't think that's a serious problem for
you. There are ways to get more variety of random numbers from, say,
Mersenne Twister, which might be what you are using, but I'll leave that
for another day.
octave:6> b = randn(2000,1);
octave:7> p = kolmogorov_smirnov_test_2(a,b)
p = 0.50849
[That's a pretty low p-value! Is the K-S test actually going to be
useful if it can be so wrong about the normally-distributed datasets as
large as 2,000 points? Let's try bumping it up to 20,000...]
I think you don't understand what K-S test is doing. It compares two
samples to see if they come from the same distribution. The p-value is
uniformly distributed on [0,1] if the distributions are really the same.
Your distributions are the same, so any p-value is equally likely. The
value of 0.5 is approximately the expected value, so seeing a 0.5 is a
good thing, not an indication of a problem.
Mike
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------