[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: casefile random access
From: |
John Darrington |
Subject: |
Re: casefile random access |
Date: |
Sun, 4 Jun 2006 10:57:08 +0800 |
User-agent: |
Mutt/1.5.4i |
On Sat, Jun 03, 2006 at 11:33:49AM -0700, Ben Pfaff wrote:
If you're using a casefile, then it's backed either by an array
of cases in memory or by a disk file in the casefile format.
Random access in an array is trivial. A disk file can be read
sequentially or randomly. We currently do only sequential
access. Adding random access wouldn't change that: procedures
would still read the casefile sequentially.
[:snip:]
In short, I think that random access for interactive usage is
fine.
[:snip:]
OK.
If you're able and willing to write a random access casereader, then
that will certainly make the gui code simpler. I've got a change
almost ready to commit, which will remove the GUI's hard limit on the
number of variables. The next major step for the GUI is to unlimit
the number of cases. So I'm just about ready to use such casereader.
My idea of the interface would be something along the lines of the
following, but you may have better ideas.
struct ra_casereader;
/* Read case CNUM into C */
bool
ra_casereader_read (struct ra_casereader *reader, int cnum,
struct ccase *c) ;
There's another issue here. All this assumes that your data is
in a casefile. But you're really talking about a system file,
which is a different beast. To do what I'm talking about above,
you'd have to copy the system file's data in a casefile. This
would double the disk space needed (the original system file plus
a copy in a disk-based casefile). What you might really want is
to be able to operate directly on the system file's content. The
system file interface doesn't support random access, but it
could, just as the casefile interface could. At least, it could
easily for non-compressed system files; it would require extra
time or extra space to support random access in compressed system
files (because there's no way to know where to seek to).
I don't think it's worth messing with the internals of sysfile-reader
just for the GUI's benefit. It's true that at the moment, the GUI
operates only on system files. But this was just a temporary
convenience that was easy(ish) to program. Clearly what is needed is the
GUI to have its own casefile.
I'll conclude by adding a final, forward-looking issue.
Currently, every procedure reads the active file from somewhere,
such as a system file or casefile, transforms it, and then writes
the transformed version to a casefile[*]. That is, it always
makes a new copy (and if the old version is a casefile, throws
away the old copy). (In SPSS syntax, this is equivalent to
always running the CACHE utility.) But, as you've pointed out
before, this is wasteful; it is usually[+] possible to avoid
writing a new copy, if you just retain the old transformations
and re-apply them to the old data, followed by any new
transformations, on the next procedure. I'm planning to
implement this relatively soon. But after that, there's no
obvious place for the data viewer window to get its data from,
because the data it wants to show is not actually stored
anywhere; it's just defined in terms of a source file plus a
bunch of transformations. I'm not sure what we'll want to do
about that; one option would be to, when using the GUI, always
write out a new copy.
There was a nagging worry beginning to grow in the back of my mind
about this. Like you say, probably the answer is to explicitly read
into a new casefile if necessary.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
pgpq4T7N5bd44.pgp
Description: PGP signature
- Re: PSPP conference call notes., Ben Pfaff, 2006/06/01
- Re: PSPP conference call notes., John Darrington, 2006/06/01
- Re: PSPP conference call notes., Ben Pfaff, 2006/06/01
- Re: PSPP conference call notes., John Darrington, 2006/06/02
- casefile random access (was: Re: PSPP conference call notes.), Ben Pfaff, 2006/06/03
- Re: casefile random access,
John Darrington <=
- Re: casefile random access, Ben Pfaff, 2006/06/05
- Re: casefile random access, John Darrington, 2006/06/05
- Re: casefile random access, Ben Pfaff, 2006/06/05
- Re: casefile random access, John Darrington, 2006/06/06
- Re: casefile random access, Ben Pfaff, 2006/06/06
- Re: casefile random access, John Darrington, 2006/06/06
- Re: casefile random access, Ben Pfaff, 2006/06/06
- Re: casefile random access, John Darrington, 2006/06/07
- Re: casefile random access, Ben Pfaff, 2006/06/08
- Re: casefile random access, John Darrington, 2006/06/08