[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: design question
From: |
Ben Pfaff |
Subject: |
Re: design question |
Date: |
Thu, 23 Jun 2005 21:04:00 -0700 |
User-agent: |
Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux) |
Jason Stover <address@hidden> writes:
> So I have two choices, which can be described as follows. Assume cf
> is my casefile *. I had decided to use method 1 before emailing the
> list; I'm asking about method 2 mostly out of curiosity.
>
> 1.
> i = 0;
> for(r = casefile_get_reader (cf);
> casereader_read (r, &c) ; case_destroy (&c))
> {
> /*
> * I copy the data I need from the active file and
> * pass through them again if necessary.
> */
> store_data (&stored_data, i);
> i++;
> }
> number_of_data = i;
>
> for (j = 0; j < number_of_data; j++)
> {
> /* Second data pass here */
> do_wa (stored_data, j);
> }
If I understand correctly, stored_data contains a copy of each
case? Then I wouldn't recommend this technique, because a
casefile is designed for storing lots of data and is smart enough
to dump it to disk and re-read it from disk efficiently, if
necessary.
> 2. (I know this is technically illegal, but is something like
> it legal?)
>
> for(r = casefile_get_reader (cf);
> casereader_read (r, &c) ; case_destroy (&c))
> {
> /* Do part one. */
> }
> for(r = casefile_get_reader (cf);
> casereader_read (r, &c) ; case_destroy (&c))
> {
> /* Do part two. */
> }
This is what I would recommend.
> I was going to use method 1, but I started wondering if anyone should
> ever use method 2, or something like it? The casefile is discarded
> after the cases are read (according to a comment in casefile.c). Does
> 'discarded' mean the data are no longer accessbile, ever ever ever?
> Is method 2 an abomination?
I think you're referring to the final two sentences of this
comment:
/* A casefile is a sequentially accessible array of immutable
cases. It may be stored in memory or on disk as workspace
allows. Cases may be appended to the end of the file. Cases
may be read sequentially starting from the beginning of the
file. Once any cases have been read, no more cases may be
appended. The entire file is discarded at once. */
It is misleading. The final sentence means to say, "When the
casefile is destroyed, it is all destroyed at once." I didn't
mean to imply the causation that "Once any cases have been read,
... the entire file is discarded..." but I think that's what
you're reading from it.
Plus, the comment is not completely accurate any longer.
Here, let me expand on it:
/* A casefile represents a sequentially accessible stream of
immutable cases.
If workspace allows, a casefile is maintained in memory. If
workspace overflows, then the casefile is pushed to disk. In
either case the interface presented to callers is kept the
same.
The life cycle of a casefile consists of up to three phases:
1. Writing. The casefile initially contains no cases. In
this phase, any number of cases may be appended to the
end of a casefile. (Cases are never inserted in the
middle or before the beginning of a casefile.)
Use casefile_append() or casefile_append_xfer() to
append a case to a casefile.
2. Reading. The casefile may be read sequentially,
starting from the beginning, by "casereaders". Any
number of casereaders may be created, at any time,
during the reading phase. Each casereader has an
independent position in the casefile.
Casereaders may only move forward. They cannot move
backward to arbitrary records or seek randomly. (In the
future, the concept of a "casemark" will be introduced
to allow a limited form of backward seek, but this has
not yet been implemented.)
Cloning casereaders is possible, but no one has had a
need for it yet, so it is not implemented.
Use casefile_get_reader() to create a casereader for
use in phase 2. This also transitions from phase 2 to
phase 3. Calling casefile_mode_reader() makes the same
transition, without creating a casereader.
Use casereader_read(), casereader_read_xfer(), or
casereader_read_xfer_assert() to read a case from a
casereader. Use casereader_destroy() to discard a
casereader when it is no longer needed.
3. Destruction. This phase is optional. The casefile is
also read with casereaders in this phase, but the
ability to create new casereaders is curtailed.
In this phase, casereaders could still be cloned, and
casemarks could still be used to seek backward (given
that we eventually implement these functionalities).
To transition from phase 1 or 2 to phase 3 and create a
casereader, call casefile_get_destructive_reader().
The same functions apply to the casereader obtained
this way as apply to casereaders obtained in phase 2.
After casefile_get_destructive_reader() is called, no
more casereaders may be created with
casefile_get_reader() or
casefile_get_destructive_reader(). (If cloning of
casereaders or casemarks were implemented, they would
still be possible.)
The purpose of the limitations applied to casereaders
in phase 3 is to allow in-memory casefiles to fully
transfer ownership of cases to the casereaders,
avoiding the need for extra copies of case data. For
relatively static data sets with many variables, I
suspect (without evidence) that this may be a big
performance boost.
When a casefile is no longer needed, it may be destroyed with
casefile_destroy(). This function will also destroy any
remaining casereaders. */
--
Ben Pfaff
email: address@hidden
web: http://benpfaff.org