octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NaN-toolbox much faster now


From: Jaroslav Hajek
Subject: Re: NaN-toolbox much faster now
Date: Mon, 16 Mar 2009 13:56:40 +0100

On Mon, Mar 16, 2009 at 1:13 PM, Alois Schlögl <address@hidden> wrote:

>
> Thanks for confirming the test. You asked for "opinions about skipping
> NaN's, and the Octave NA's support". Here are some thoughts on that issue.
>
> Concerning the question whether NaN's and NA should be handled separately.
> - - Just because R has NA's is not necessarily a good reason why Octave
> needs it, too. Possible advantages need to be explained.
>
> - - In statistical and probabilistic applications skipping both, NaN and
> NA, is a reasonable approach - there is no need to distinguish NaN from NA.
>

In fact this seems to be what R actually does. It seems that in R, the
classification of NA/NaN is exactly the converse of Octave's, i.e.
isna(NaN) is true, while isnan(NA) is not, and that when you tell a
function to "skip NAs" (na.rm = TRUE), it indeed skips both NaNs and
NAs. So the question I'm raising here is: is Octave's support of NAs
actually a good idea, given that it is, in a good sense, actually
incompatible with R? Of course, "fixing" isnan to work like in R would
on the other hand break compatibility with C and Matlab and
everything.


> - - In case NaN's are used for error handling, the question is how is NA
> improving the error handling? The main advantage would be that less
> NaN's need to be handled, but NA come with additional costs of added
> complexity and possible confusion (causing more programming errors, slow
> down of development speed, as well as performance loss). Therefore, if
> NA's should get special support, the benefits of this concept should be
> made clear.
>
> - - The benefits of the NaN-toolbox over the traditional approach are:
> (i) functions are doing more often the right thing,
> (ii) applications are less likely to fail due to NaN-related issues.
> (iii) its more likely that users unaware of the NaN-issue get it right
> in the first place,
> (iv) no need to think about whether nanmean or mean is the right function;
> (v) of course using always nanmean(), etc. would also do, but its nicer
> to write only mean(), etc.;
> Basically, the idea is to make the use of these functions easier. The
> use of NA in addition to NaN's is detrimental to this aim. So the
> advantage of using NA's is not clear.
>

These are points that we've discussed previously. They're mostly
agreeable with unless functions are used in a non-statistical sense -
and I can only imagine that for "mean". After all, they're classified
as "statistics", so one could agree that "mean" should be understood
to be the statistical mean.

Performance is another consideration. It seems that penalty for
removing NaNs ranges up to some 30%, which may be significant for some
uses. So maybe the functions should provide an option to turn off the
checking for NaNs, just for the case when data are guaranteed to be
NaN-free.

cheers

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]