[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Covariance Matrices
From: |
Jason Stover |
Subject: |
Re: Covariance Matrices |
Date: |
Thu, 21 Aug 2008 10:28:43 -0400 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
On Thu, Aug 21, 2008 at 08:29:32PM +0800, John Darrington wrote:
> On Wed, Aug 20, 2008 at 10:57:33AM -0400, Jason Stover wrote:
>
> I had planned to add a one-pass algorithm, but used a two-pass algorithm
> first because, "usually", two-pass algorithms have lower relative errors
> than
> one-pass algorithms, according to
>
> "Algorithms for Computing the Sample Variance: Analysis and
> Recommendations"
> TF chan, G. Golub, R. LeVeque. American Statistician, v37 n3, 1983, pp.
> 242-247.
>
> So I had planned to add a one-pass algorithm, based on the algorithms in
> that
> paper, but never got around to it.
>
>
> My only suggestion is that the code in covariance-matrix.[ch] should have
> both one- and two-pass algorithms. So maybe add covariance_accumulate()
> and change covariance_pass_two () to incorporate your changes, but
> passing
> an argument like double *means to use the means.
>
>
> My opinion is that we should prefer speed rather than precision. So
> all things being equal, I would use the single pass method. However,
> in cases where there is a compelling reason to need another pass, then
> the more accurate method can be used.
I agree. Do you want to change the code in covariance-matrix.c, or should
I do it?
> However, in many PSPP commands, the logic required to determine how
> many passes are necessary is quite nasty; it can depend on exactly
> which options are selected. For some time now, I've been thinking
> of a scheme where each statistic is aware of its own dependencies.
> With such a scheme, it would be possible to specify a set of
> statistics, then the minimum number of passes would be automatically
> determined, and the most accurate method for that number would be
> automatically selected.
>
> This scheme would take a bit of thought, and a lot of recoding. But
> if the routines to calculate these statistics have a similar
> interface, then that'll be the first step.
Sounds good.
-Jason