[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regression and glm with big data
From: |
John Darrington |
Subject: |
Re: regression and glm with big data |
Date: |
Wed, 15 Aug 2007 13:10:42 +0800 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
There's a similar problem with percentiles (used by frequencies and
examine).
I suggest we leave these until after the release.
J'
On Tue, Aug 14, 2007 at 10:45:58PM -0400, Jason Stover wrote:
Right now, linreg.c, regression.q and glm.q won't handle large data
sets very well. The problem is that the regression and (currently
fetal) glm procedure store the entire data set in memory, then pass
the data to pspp_linreg () which finds the least squares estimates.
Storing the entire data set in memory isn't necessary, just easier to
code. PSPP could handle much bigger data sets if, in the
casereader_read loop, it computed two matrix products from the data in
a single pass, then sent that, much smaller, information to
pspp_linreg().
But there may be tasks for which pspp_linreg () should accept all the
data as a single matrix, so it should probably be able to do that,
too.
My question is: Should I do this now, or wait until after the release?
It will probably change a lot of code in linreg.c, and could introduce
several bugs. The benefit would be to make any procedure that needs
regression able to run with very large data sets.
-Jason
_______________________________________________
pspp-dev mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/pspp-dev
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
signature.asc
Description: Digital signature