Re: design question

pspp-dev
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: design question

From:	Ben Pfaff
Subject:	Re: design question
Date:	Thu, 23 Jun 2005 21:04:00 -0700
User-agent:	Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)
Jason Stover <address@hidden> writes:

> So I have two choices, which can be described as follows.  Assume cf
> is my casefile *. I had decided to use method 1 before emailing the
> list; I'm asking about method 2 mostly out of curiosity.
>
> 1.  
>     i = 0; 
>     for(r = casefile_get_reader (cf);
>       casereader_read (r, &c) ; case_destroy (&c)) 
>     {
>       /*
>        * I copy the data I need from the active file and
>        * pass through them again if necessary.
>        */
>       store_data (&stored_data, i);
>       i++;
>     }
>     number_of_data = i;
>
>      for (j = 0; j < number_of_data; j++)
>       {
>         /* Second data pass here */
>         do_wa (stored_data, j);
>       }

If I understand correctly, stored_data contains a copy of each
case?  Then I wouldn't recommend this technique, because a
casefile is designed for storing lots of data and is smart enough
to dump it to disk and re-read it from disk efficiently, if
necessary.

> 2. (I know this is technically illegal, but is something like
> it legal?)
>
>   for(r = casefile_get_reader (cf);
>       casereader_read (r, &c) ; case_destroy (&c)) 
>     {
>       /* Do part one. */
>     }
>   for(r = casefile_get_reader (cf);
>       casereader_read (r, &c) ; case_destroy (&c)) 
>     {
>       /* Do part two. */
>     }

This is what I would recommend.

> I was going to use method 1, but I started wondering if anyone should
> ever use method 2, or something like it? The casefile is discarded
> after the cases are read (according to a comment in casefile.c). Does
> 'discarded' mean the data are no longer accessbile, ever ever ever?
> Is method 2 an abomination?

I think you're referring to the final two sentences of this
comment:

    /* A casefile is a sequentially accessible array of immutable
       cases.  It may be stored in memory or on disk as workspace
       allows.  Cases may be appended to the end of the file.  Cases
       may be read sequentially starting from the beginning of the
       file.  Once any cases have been read, no more cases may be
       appended.  The entire file is discarded at once. */

It is misleading.  The final sentence means to say, "When the
casefile is destroyed, it is all destroyed at once."  I didn't
mean to imply the causation that "Once any cases have been read,
... the entire file is discarded..." but I think that's what
you're reading from it.

Plus, the comment is not completely accurate any longer.

Here, let me expand on it:

/* A casefile represents a sequentially accessible stream of
   immutable cases.

   If workspace allows, a casefile is maintained in memory.  If
   workspace overflows, then the casefile is pushed to disk.  In
   either case the interface presented to callers is kept the
   same.

   The life cycle of a casefile consists of up to three phases:

       1. Writing.  The casefile initially contains no cases.  In
          this phase, any number of cases may be appended to the
          end of a casefile.  (Cases are never inserted in the
          middle or before the beginning of a casefile.)

          Use casefile_append() or casefile_append_xfer() to
          append a case to a casefile.

       2. Reading.  The casefile may be read sequentially,
          starting from the beginning, by "casereaders".  Any
          number of casereaders may be created, at any time,
          during the reading phase.  Each casereader has an
          independent position in the casefile.

          Casereaders may only move forward.  They cannot move
          backward to arbitrary records or seek randomly.  (In the
          future, the concept of a "casemark" will be introduced
          to allow a limited form of backward seek, but this has
          not yet been implemented.)

          Cloning casereaders is possible, but no one has had a
          need for it yet, so it is not implemented.

          Use casefile_get_reader() to create a casereader for
          use in phase 2.  This also transitions from phase 2 to
          phase 3.  Calling casefile_mode_reader() makes the same
          transition, without creating a casereader.

          Use casereader_read(), casereader_read_xfer(), or
          casereader_read_xfer_assert() to read a case from a
          casereader.  Use casereader_destroy() to discard a
          casereader when it is no longer needed.

       3. Destruction.  This phase is optional.  The casefile is
          also read with casereaders in this phase, but the
          ability to create new casereaders is curtailed.

          In this phase, casereaders could still be cloned, and
          casemarks could still be used to seek backward (given
          that we eventually implement these functionalities).

          To transition from phase 1 or 2 to phase 3 and create a
          casereader, call casefile_get_destructive_reader().
          The same functions apply to the casereader obtained
          this way as apply to casereaders obtained in phase 2.
          
          After casefile_get_destructive_reader() is called, no
          more casereaders may be created with
          casefile_get_reader() or
          casefile_get_destructive_reader().  (If cloning of
          casereaders or casemarks were implemented, they would
          still be possible.)

          The purpose of the limitations applied to casereaders
          in phase 3 is to allow in-memory casefiles to fully
          transfer ownership of cases to the casereaders,
          avoiding the need for extra copies of case data.  For
          relatively static data sets with many variables, I
          suspect (without evidence) that this may be a big
          performance boost.

   When a casefile is no longer needed, it may be destroyed with
   casefile_destroy().  This function will also destroy any
   remaining casereaders. */
-- 
Ben Pfaff 
email: address@hidden
web: http://benpfaff.org
[Prev in Thread]
Current Thread
[Next in Thread]
design question, Jason Stover, 2005/06/23
- Re: design question, John Darrington, 2005/06/23
- Re: design question, Ben Pfaff <=
Prev by Date: Re: 0.4.0 seems ready--all bugs closed
Next by Date: Re: casefile.c revision
Previous by thread: Re: design question
Index(es):
- Date
- Thread