[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ppsp and python
From: |
John Darrington |
Subject: |
Re: ppsp and python |
Date: |
Tue, 23 Oct 2012 08:45:05 +0200 |
User-agent: |
Mutt/1.5.20 (2009-06-14) |
On Mon, Oct 22, 2012 at 10:40:35AM -0500, Daniel Elliott wrote:
I am not knowledgeable about the "SPSS way" of doing things with
respect to creating new functions but I figured that, with neural
nets for example, I could provide the training parameters available
and you could tell me what format the input and output data should be.
From there, writing the interface wouldn't be too painful.
My assumption is that PSPP users are more focused on analyzing results
from a returned model than they are interested in the minutiae of
implementation detail. From this perspective, I think that the best
way to use an neural network from PSPP would be k-fold
cross-correlation or bootstrap cross-validation which are described in
chapter 6 of Empirical Methods for Artificial Intelligence by Paul
Cohen. This would shield the user from as many of the issues in model
selection as possible. It would be good if the users could specify
stuff like the number of layers and the number of nodes in each layer
and the type of activation functions to use or some subset of these
items. Sadly, the approach to machine learning algorithms is pretty
undisciplined.
There are several goals for PSPP. One is to provide a free replacement for
SPSS. Just as libreOffice does for MSWord and Gnumeric does for Excel. A
significant proportion of our users are students doing undergraduate stats
courses. These users need a) a user interface which resembles that of SPSS,
and b) results which resemble those of SPSS, both in terms of presentation
and values.
Now SPSS, has several NN options. For example there is an MLP command. If we
were to implement a MLP command, the user interface should therefore resemble
that of SPSS, although the implementation need not. Alternatively, one could
provide a PSPP "extension" which does not claim to be SPSS compatible, so long
as that is clear in the documentation.
A second class of users, are professional statisticians, who process HUGE
amounts of data - datasets with hundreds of millions of observations. The
routines used in PSPP go to great pains to cope with such datasets. I
mention this, because it can sometimes be a non-trivial task to convert an
existing routine to do that, especially if the implementation dynamically
allocates memory to store its data.
Again, I am very far from being a competent statistician, but would
enjoy the opportunity to provide some tools to PSPP. My abilities are
primarily in things like logistic regression, mixtures of Gaussians,
PCA, and neural networks for classification and prediction. I also do
reinforcement learning but I doubt that is of any use.
PCA is already supported. See the FACTOR command. We also have k-means
clustering. Coincidentally, logistic regression I am already working on
and hope to complete very shortly. We don't yet have any neural net routines,
nor do we have hierachial clustering. So we could certainly use some
contributions there.
I suggest you have a look at how some of the existing algorithms are
implemented,
and perhaps post some code to show how you think your contributions could
fit.
Thanks for your interest.
Regards
John
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://keys.gnupg.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature