[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
address@hidden: Re: category.c]
From: |
Jason Stover |
Subject: |
address@hidden: Re: category.c] |
Date: |
Mon, 20 Mar 2006 10:51:15 -0500 |
User-agent: |
Mutt/1.5.10i |
(I forgot to reply to the list.)
----- Forwarded message from Jason Stover <address@hidden> -----
Date: Mon, 20 Mar 2006 10:03:27 -0500
From: Jason Stover <address@hidden>
To: John Darrington <address@hidden>
Subject: Re: category.c
In-Reply-To: <address@hidden>
User-Agent: Mutt/1.5.10i
On Mon, Mar 20, 2006 at 09:03:21AM +0800, John Darrington wrote:
> I've been thinking about re-implementing T-TEST, ONEWAY and EXAMINE,
> using category.c and thus retiring the rather ad hoc group.c and
> factor-stats.c files.
>
> Several questions about category.c :
>
>
> 1. cat_value_find uses a linear search. Might is not be better to use
> a hash instead?
Yes. category.c is my first attempt at cacheing the information
related to categorical variables, and there is probably a lot
of room for improvement.
> 2. Do we really need cat-routines.h ? Can it not be merged into
> category.h ?
Separating them was a hack to prevent a build break, and the need to
do so may no longer exist. My memory is vague here, but there was an
email discussion that I can no longer find. The problem was something
like this: Most routines do not need to know about anything in
category.h or cat-routines.h, but variable.h includes category.h. When
cat-routines.h and category.h were in the same file, they caused some
compile-time errors when files that included variable.h did not also
know about everything related to category.h. I *think* the trouble may
have been a *.h file that referred to struct design_matrix. Whatever
the cause, I split category.h into two files, which may not have been
the best solution. And now, any need to keep them apart may no longer
exist.
> 3. cat_value_update seems to do nothing for numeric variables. Why is
> this? A numeric variable can be used as a categorical variable
> just as easily as an alpha one.
Good point. Encoding numeric data as categorical is usually a mistake
from a statistical standpoint, but there are circumstances when
treating a numeric variable as categorical makes perfect sense, so
maybe cat_value_update() shouldn't care what type of variable it is
looking at. This is where the question 'should we protect the user?'
comes up. Someone with a numeric variable that has, say, 10^5 distinct
values and inadvertently treats that variable as categorical could
wind up running a procedure with 0 or negative degrees of freedom;
slowing the machine down to a crawl; or, worst of all, finding bugs
we'd rather not know about. But users should probably have the ability
to treat numeric data as categorical if they want to.
> 4. If I'm reading the code right, cat_stored_values_destroy is leaky.
> It frees obs_vals, but doesn't tidy up obs_vals->vals .
> Also, shouldn't it set v->obs_vals to NULL after freeing?
You're right. That's a problem. I'll fix it soon if no one else fixes
it first.
While we're on the topic, is anyone in favor of using a garbage
collector in PSPP?
-Jason
----- End forwarded message -----
--
Jason Stover
Assistant Professor
Mathematics Department
Georgia Kung Fu & State University
"Georgia's public martial arts university"
On the web at www.gksu.edu
- address@hidden: Re: category.c],
Jason Stover <=