octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NaN-toolbox much faster now


From: Jaroslav Hajek
Subject: Re: NaN-toolbox much faster now
Date: Tue, 17 Mar 2009 10:29:48 +0100

On Tue, Mar 17, 2009 at 9:25 AM, Alois Schlögl <address@hidden> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jaroslav Hajek wrote:
>> On Mon, Mar 16, 2009 at 1:13 PM, Alois Schlögl <address@hidden> wrote:
>>
>>> Thanks for confirming the test. You asked for "opinions about skipping
>>> NaN's, and the Octave NA's support". Here are some thoughts on that issue.
>>>
>>> Concerning the question whether NaN's and NA should be handled separately.
>>> - - Just because R has NA's is not necessarily a good reason why Octave
>>> needs it, too. Possible advantages need to be explained.
>>>
>>> - - In statistical and probabilistic applications skipping both, NaN and
>>> NA, is a reasonable approach - there is no need to distinguish NaN from NA.
>>>
>>
>> In fact this seems to be what R actually does. It seems that in R, the
>> classification of NA/NaN is exactly the converse of Octave's, i.e.
>> isna(NaN) is true, while isnan(NA) is not, and that when you tell a
>> function to "skip NAs" (na.rm = TRUE), it indeed skips both NaNs and
>> NAs. So the question I'm raising here is: is Octave's support of NAs
>> actually a good idea, given that it is, in a good sense, actually
>> incompatible with R? Of course, "fixing" isnan to work like in R would
>> on the other hand break compatibility with C and Matlab and
>> everything.
>>
>>
>>> - - In case NaN's are used for error handling, the question is how is NA
>>> improving the error handling? The main advantage would be that less
>>> NaN's need to be handled, but NA come with additional costs of added
>>> complexity and possible confusion (causing more programming errors, slow
>>> down of development speed, as well as performance loss). Therefore, if
>>> NA's should get special support, the benefits of this concept should be
>>> made clear.
>>>
>>> - - The benefits of the NaN-toolbox over the traditional approach are:
>>> (i) functions are doing more often the right thing,
>>> (ii) applications are less likely to fail due to NaN-related issues.
>>> (iii) its more likely that users unaware of the NaN-issue get it right
>>> in the first place,
>>> (iv) no need to think about whether nanmean or mean is the right function;
>>> (v) of course using always nanmean(), etc. would also do, but its nicer
>>> to write only mean(), etc.;
>>> Basically, the idea is to make the use of these functions easier. The
>>> use of NA in addition to NaN's is detrimental to this aim. So the
>>> advantage of using NA's is not clear.
>>>
>>
>> These are points that we've discussed previously. They're mostly
>> agreeable with unless functions are used in a non-statistical sense -
>> and I can only imagine that for "mean". After all, they're classified
>> as "statistics", so one could agree that "mean" should be understood
>> to be the statistical mean.
>
> That's how I see it, too.
>
>>
>> Performance is another consideration. It seems that penalty for
>> removing NaNs ranges up to some 30%, which may be significant for some
>> uses. So maybe the functions should provide an option to turn off the
>> checking for NaNs, just for the case when data are guaranteed to be
>> NaN-free.
>>
>> cheers
>>
>
>
> Ok, in order to address that request, I've added the function
> flag_implicit_skip_nan.m. If you call flag_implicit_skip_nan(0), NaN's
> are not skipped anymore, and the traditional behavior is reproduced.
> This will affect all functions that are based on sumskipnan.m
> flag_implicit_skip_nan(1) will again turn on the NaN-skipping behavior.
>

I think there's been a general agreement among most contributors that
global flags are "considered harmful".
Using them from command line is fine, but in functions they become a
burden because you need to preserve their status, like this:

old_flag_implicit_skip_nan = flag_implicit_skip_nan;
flag_implicit_skip_nan (1)

unwind_protect
  ... function body ...
unwind_protect_celanup
  flag_implicit_skip_nan (old_flag_implicit_skip_nan);
end_unwind_protect

So far the trend in Octave was to eliminate global flags rather than
introduce new ones. Why not just use options?
Like std(x,0,2, "skipnans", false)

cheers

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]