pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: data access


From: John Darrington
Subject: Re: data access
Date: Sun, 2 Jul 2006 08:23:36 +0800
User-agent: Mutt/1.5.4i

I think that part of the problem is that the casefiles are designed to
be a) Fast; and b) Potentially very large.  One of the costs of these
design criteria is that they're not quite so flexible.

As Ben has explained to me before, accessing  a casefile in random
order is much less effecient than doing so in sequential order.  I'm
not sure that a 10^8 x 300 gsl-matrix would be very efficient.

I'm not a statistician, but I cannot envisage any situation where a
matrix operation (eg pre-multiply, inverse etc) would need to be
performed on a casefile as a whole; it wouldn't make sense in the
general case, because of non-numeric data.

Having said that, I'm working on abstracting the interface for the
casefiles right now.  It might be possible to devise a casefile type
that is more convenient for math routines, but probably not one that
would be quite as flexible as gsl_matrix.

Can you give me an example of a particular problem you've encountered,
and I'll see if I can come up with any suggestions.

J'


On Sat, Jul 01, 2006 at 02:01:45PM -0400, Jason Stover wrote:
     I'm wrestling with reading data via casefiles again. We've all said
     it would be nice to make reading the data easier, and Ben has
     complained about every procedure's need to pass the entire data
     set. 
     
     I thought of what might be a simple approach: Each time a procedure
     reads the data via casefiles, it stores them in a gsl_matrix, along
     with some other information about variable names, etc. Then the next
     time a procedure needs the data, it uses that gsl_matrix, if it's
     available and contains the necessary information. If not, it reads the
     data via casefiles.
     
     Filling up and using a gsl_matrix is easy. I don't know how easy it
     would be to store the meta-data the procedures would need.
     
     Pardon me if this is an old idea. But the difficulty of using
     casefiles prevents other people from contributing mathematical code,
     whereas gsl_matrices are easy to handle.
     
-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.


Attachment: pgpwBRVdIUSE5.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]