[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
status report
From: |
Ben Pfaff |
Subject: |
status report |
Date: |
Thu, 10 May 2007 23:15:26 -0700 |
User-agent: |
Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) |
The last week or so I've been cleaning up the simpler-proc branch
for review and eventual merging. I think that this process will
probably take another week or two.
I've also done some performance regression testing on
simpler-proc versus the main branch and discovered some places
where simpler-proc performance was bad, especially in sorting. I
fixed the worst of it but there's still a little needed
improvement before it'll be ready for merge. Probably only a few
hours worth of work there though.
The last few days I've started writing the PSPP developers guide
that I mentioned a while back. Here's a tentative outline, which
is bound to change as I continue writing:
Developer's Guide
* Introduction
* Basic concepts
** Values
** Variables
** Dictionaries
** Data sets
** Pools
** Coding conventions
* Syntax parsing
* Data processing
** Reading data
*** Casereaders generalities
*** Casereaders from data files
*** Casereaders from the active file
*** Other casereaders
** Writing data
*** Casewriters generally
*** Casewriters to data files
*** Modifying the active file
**** Modifying cases obtained from active file casereaders has no real effect
**** Transformations; procedures that transform
** Transforming data
*** Sorting and merging
*** Filtering
*** Grouping
**** Ordering and interaction of filtering and grouping
*** Multiple passes over data
*** Counting cases and case weights
** Best practices
*** Multiple passes with filters versus single pass with loops
*** Sequential versus random access
*** Managing memory
*** Passing cases around
*** Renaming casereaders
*** Avoiding excessive buffering
*** Propagating errors
*** Avoid static/global data
*** Don't worry about null filters, groups, etc.
*** Be aware of reference counting semantics for cases
* Presenting output
The data processing chapter is the only one fully outlined. I
figure the syntax parsing and output presentation chapters
shouldn't be written until the corresponding bits of PSPP are
more solid. I have plans to work on each of those in turn after
merging simpler-proc; I'll be sure to talk them through here
before going beyond a prototype implementation.
The developers guide is not yet checked in to simpler-proc, not
even a skeleton. Probably I'll do an initial check-in over the
weekend.
Outside of PSPP, I have two big projects that are taking up time:
* Graduation: I turned in the first draft of my PhD
thesis to my advisor yesterday. I'm hoping to schedule
my defense for late June and then graduate by the end
of Stanford's summer quarter. Along with thesis
revisions and preparing my defense I'm also going to
embark on a job search. Probably I'll take a job
locally at least January, when universities start their
faculty searches up again (I'm too late for this year's
round).
* Pintos: My educational operating system used at
Stanford and elsewhere. I'm currently working to
integrate the contribution of a USB mass storage layer,
which allows students to demonstrate their projects on
real machines by running the OS off a USB flash drive.
This is considerably more impressive than running
inside a virtual machine as they currently do, so it
seems worthwhile, but I'm very picky about what I put
into Pintos so I'm having to do a lot of refactoring
work.
--
Ben Pfaff
http://benpfaff.org
- status report,
Ben Pfaff <=
- Re: status report, John Darrington, 2007/05/11
- Re: status report, Ben Pfaff, 2007/05/11
- Re: status report, John Darrington, 2007/05/11
- Re: status report, Ben Pfaff, 2007/05/12
- Re: status report, John Darrington, 2007/05/13
- Re: status report, Ben Pfaff, 2007/05/13
- Re: status report, John Darrington, 2007/05/13