octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NaN-toolbox much faster now


From: Alois Schlögl
Subject: Re: NaN-toolbox much faster now
Date: Tue, 17 Mar 2009 16:46:16 +0100
User-agent: Thunderbird 2.0.0.19 (X11/20090105)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jaroslav Hajek wrote:
> On Tue, Mar 17, 2009 at 9:25 AM, Alois Schlögl <address@hidden> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Jaroslav Hajek wrote:
>>> On Mon, Mar 16, 2009 at 1:13 PM, Alois Schlögl <address@hidden> wrote:
>>>
>>>> Thanks for confirming the test. You asked for "opinions about skipping
>>>> NaN's, and the Octave NA's support". Here are some thoughts on that issue.
>>>>
>>>> Concerning the question whether NaN's and NA should be handled separately.
>>>> - - Just because R has NA's is not necessarily a good reason why Octave
>>>> needs it, too. Possible advantages need to be explained.
>>>>
>>>> - - In statistical and probabilistic applications skipping both, NaN and
>>>> NA, is a reasonable approach - there is no need to distinguish NaN from NA.
>>>>
>>> In fact this seems to be what R actually does. It seems that in R, the
>>> classification of NA/NaN is exactly the converse of Octave's, i.e.
>>> isna(NaN) is true, while isnan(NA) is not, and that when you tell a
>>> function to "skip NAs" (na.rm = TRUE), it indeed skips both NaNs and
>>> NAs. So the question I'm raising here is: is Octave's support of NAs
>>> actually a good idea, given that it is, in a good sense, actually
>>> incompatible with R? Of course, "fixing" isnan to work like in R would
>>> on the other hand break compatibility with C and Matlab and
>>> everything.
>>>
>>>
>>>> - - In case NaN's are used for error handling, the question is how is NA
>>>> improving the error handling? The main advantage would be that less
>>>> NaN's need to be handled, but NA come with additional costs of added
>>>> complexity and possible confusion (causing more programming errors, slow
>>>> down of development speed, as well as performance loss). Therefore, if
>>>> NA's should get special support, the benefits of this concept should be
>>>> made clear.
>>>>
>>>> - - The benefits of the NaN-toolbox over the traditional approach are:
>>>> (i) functions are doing more often the right thing,
>>>> (ii) applications are less likely to fail due to NaN-related issues.
>>>> (iii) its more likely that users unaware of the NaN-issue get it right
>>>> in the first place,
>>>> (iv) no need to think about whether nanmean or mean is the right function;
>>>> (v) of course using always nanmean(), etc. would also do, but its nicer
>>>> to write only mean(), etc.;
>>>> Basically, the idea is to make the use of these functions easier. The
>>>> use of NA in addition to NaN's is detrimental to this aim. So the
>>>> advantage of using NA's is not clear.
>>>>
>>> These are points that we've discussed previously. They're mostly
>>> agreeable with unless functions are used in a non-statistical sense -
>>> and I can only imagine that for "mean". After all, they're classified
>>> as "statistics", so one could agree that "mean" should be understood
>>> to be the statistical mean.
>> That's how I see it, too.
>>
>>> Performance is another consideration. It seems that penalty for
>>> removing NaNs ranges up to some 30%, which may be significant for some
>>> uses. So maybe the functions should provide an option to turn off the
>>> checking for NaNs, just for the case when data are guaranteed to be
>>> NaN-free.
>>>
>>> cheers
>>>
>>
>> Ok, in order to address that request, I've added the function
>> flag_implicit_skip_nan.m. If you call flag_implicit_skip_nan(0), NaN's
>> are not skipped anymore, and the traditional behavior is reproduced.
>> This will affect all functions that are based on sumskipnan.m
>> flag_implicit_skip_nan(1) will again turn on the NaN-skipping behavior.
>>
> 
> I think there's been a general agreement among most contributors that
> global flags are "considered harmful".

I guess this comes from the old days (around Octave 2.0 and 2.1) when
there were really much to many global flags used much too often.

However, a general ban on on the use of global flags is going to far, IMHO.

OTOH, if global variables within functions are really "harmful", it
would be only consequent to enforce this rule by preventing the
use of global variables within functions. ;-)


> Using them from command line is fine, but in functions they become a
> burden because you need to preserve their status, like this:
> 
> old_flag_implicit_skip_nan = flag_implicit_skip_nan;
> flag_implicit_skip_nan (1)
> 
> unwind_protect
>   ... function body ...
> unwind_protect_celanup
>   flag_implicit_skip_nan (old_flag_implicit_skip_nan);
> end_unwind_protect
> 
> So far the trend in Octave was to eliminate global flags rather than
> introduce new ones. Why not just use options?
> Like std(x,0,2, "skipnans", false)
> 

Actually, there is only a problem if the flag is modified within the
unwind_protect part. If some "specialist" is changing the default
behavior within an unwind_protect part, (s)he should be also able to
call for flag_implicit_skip_nan(1) in the unwind_protect_clean part, too.

The use of flag_implicit_skip_nan, or fun(...,"skipnans", false) is
detrimental to clean programming and should really remain a thing for
"specialists". Neither do I see a need for nor do I endorse a frequent
use of this feature. Perhaps, it becomes obsolete, again.

The present solution is simple and elegant because the flag controls
only the behavior of some core functions like sumskipnan. All other
functions like mean, var, std, meansq, etc. "inherit" this behaviour
without changing the source code. The alternative of adding more input
arguments would bloat the functions with checks on the input arguments.

Although the present solution is sufficient for the task, its only the
second best solution. The best solution would be if the data has some
attribute flag indicating whether it contains NaN's or not. (The
attribute should be generated when the data is generated). Then, an
automated approach for selecting the algorithm could be used, and
flag_implicit_skip_nan() and fun(...., "skipnans", false) would become
obsolete. Would not this be a nice idea?


Cheers,
   Alois


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkm/xcQACgkQzSlbmAlvEIi6BQCfdYg7x3NEwFzoz5VEFTopC/8M
7zcAnAyFh4Vht8bRWwaBGsDzlL9rH2mW
=W1La
-----END PGP SIGNATURE-----


reply via email to

[Prev in Thread] Current Thread [Next in Thread]