|
From: | Dan Suthers |
Subject: | [igraph] Possible error in documentation of KS.p for R/igraph fit_power_law (or in my understanding) |
Date: | Mon, 30 Sep 2019 01:38:29 -1000 |
User-agent: | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 |
Dear igraph community,
Gábor Csárdi recommended that I post this question. It concerns
inconsistency between documentation of fit_power_law and results.
The documentation for fit_power_law {igraph} says (I added
underlining):
The ‘plfit’ implementation also uses the maximum likelihood principle to determine alpha for a given xmin; When xmin is not given in advance, the algorithm will attempt to find itsoptimal value for which the p-value of a Kolmogorov-Smirnov test between the fitted distribution and the original sample is the largest.
KS.p Numeric scalar, the p-value of the Kolmogorov-Smirnov test. Small p-values (less than 0.05) indicate that the test rejected the hypothesis that the original data could have been drawn from the fitted power-law distribution.
> min(degree(gnm))
[1] 69
> min(degree(gnm))
[1] 69
> mean(degree(gnm))
[1] 100
> max(degree(gnm))
[1] 132
This is easier to explain: xmin shows it is fitting to the far
right end of the distribution, and alpha is well within the
random-like regime. So, I get it that interpreting p without
looking at the other parameters is risky.
However, looking at the other extreme, let's generate a
distribution expected to follow the power law:
> sfp <- sample_fitness_pl(1000, 50000, 2.2)
> fit_power_law(degree(sfp), implementation="plfit")
$continuous
[1] FALSE
$alpha
[1] 2.515488
$xmin
[1] 49
$logLik
[1] -4372.05
$KS.stat
[1] 0.05708254
$KS.p
[1] 0.007706345
> min(degree(sfp))
[1] 28
> max(degree(sfp))
[1] 411
Here, the alpha is as expected for a scale free network, xmin
includes most of the distribution, and KS distance and KS.p are
small, but the documentation says that small values mean we REJECT
the hypothesis that it follows a power law.
Is the documentation in error? Should it say we reject the
hypothesis that it comes from a random distribution (as is common
in the social sciences use of p values)?
Any light you can shed on this is much appreciated. -- Dan
-- Dan Suthers Professor and Graduate Program Chair Dept. of Information and Computer Sciences University of Hawaii at Manoa 1680 East West Road, POST 309, Honolulu, HI 96822 (808) 956-3890 office Personal: http://www2.hawaii.edu/~suthers/ Lab: http://lilt.ics.hawaii.edu/ Department: http://www.ics.hawaii.edu/
[Prev in Thread] | Current Thread | [Next in Thread] |