- The igraph documentation suggests that the bfgs function is used to estimate the power law alpha, but I think the C implementation relies on the Broyden-Fletcher-Goldfarb-Shanno optimization function of the lbfgs library instead. Is that correct?
This is the exact implementation of the BFGS optimization that we use in power law fitting:
As far as I know this is the C port of the limited memory variant of the Broyden-Fletcher-Goldfarb-Shanno method, originally written in FORTRAN. The license notes in the source code might give you more clues.
- The fit_power_law function relies on the MLE function of the stat4 package. I am curious why this was deprecated, given the availability of plfit and MLE parameters. Is this simply a memory issue?
I don't know; this is purely in the domain of the R interface of igraph; the C core uses the L-BFGS method and my "plfit" library:
The plfit library is an efficient implementation of the method published by Clauset, Shalizi and Newman:
Clauset A, Shalizi CR and Newman MEJ: Power-law distributions
in empirical data. SIAM Review 51, 661-703 (2009).
- How to interpret the p-value of the Kolmogorov-Smirnov test?
See the paper cited above for more details.
- The igraph help file states: "Small p-values (less than 0.05) indicate that the test rejected the hypothesis that the original data could have been drawn from the fitted power-law distribution" . The C implementation of the KS test in igraph uses the Hurwitz Zeta function. Shouldn't this mean that high p-values indicate a good model fit, as suggested by Clauset et al (2009:678)?
Well, tests based on p-values are not really about whether a model is a "good fit" or a "bad fit"; a low p-value _roughly_ says that "it is very unlikely that the data could have been generated from the hypothesized distribution" (in our case, a power-law). A high p-value _roughly_ means that "the data may have come from the hypothesized distribution"; however, there could be alternative distributions that can describe the data just as well.
So, in a nutshell:
low p-value --> null hypothesis (power-law) rejected --> data is likely not a power-law
high p-value --> null hypothese (power-law) _not_ rejected --> data could come from a power-law, or maybe from something else, we don't know, we just could not _exclude_ the power-law
All the best,
T.