Re: NaN-toolbox much faster now

octave-maintainers
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: NaN-toolbox much faster now

From:	Jaroslav Hajek
Subject:	Re: NaN-toolbox much faster now
Date:	Tue, 24 Mar 2009 13:15:24 +0100
On Tue, Mar 24, 2009 at 12:03 PM, Alois Schlögl
<address@hidden> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jason Riedy wrote:
>> And Alois Schlögl writes:
>>> Or if the NaN-toolbox is in place, you can check this with
>>> flag_nan_occured() afterwards, [...]
>>
>> I'll look into your toolbox more in a month or so.  Higher-level Octave
>> support for detecting exceptional events will take some thought.  It's
>> not as easy as a "flag_nan_occurred()", because the lower-level
>> libraries don't necessarily clear the flag when appropriate (e.g. STEMR,
>> etc. in LAPACK).
>>
>> Without good invalid flag support, I would recommend the default of
>> returning NaN from mean() and other functions whenever there is a
>> NaN in the data.  Otherwise I worry the NaNs will go un-noticed
>> when they are important.  But I may have too skewed a view of
>> mean's use in Octave.
>>
>> In the Octave historical context, returning the number from max() &
>> min() makes sense.  For linear algebra, these often are used for
>> selecting pivot-like entries or scaling, and that's similar to the
>> centering usage for mean().
>>
>> Longer-term, better invalid flag handling would make me inclined to
>> return the number from mean() and friends...
>>
>> I write:
>>> Baring more sophisticated handling of exceptional events and data,
>>> the most reliable choices are to provide either a per-call optional
>>> argument or two different routines.  R takes the former route[1],
>>> and we took the latter for max(a, b) and min(a, b) in the IEEE-754
>>> standard.
>>
>> To which Alois Schlögl responds:
>>> I do not know about R, but last time I looked at IEEE-754r
>>> http://www.validlab.com/754R/drafts/archive/2006-10-04.pdf p.28,
>>> says about minNum(x,y) and maxNum(x,y) [...]
>>
>> Augh.  I can't begin to describe the pain when I checked the
>> standard (I have a copy).  After *months* of arguing about which
>> max&min to include and being out-voted on having only the
>> number-preferring ones, I now find out the NaN-preferring ones
>> disappeared somewhere.
>>
>> I apologize for the mis-information.  And I recommend avoiding
>> close work in standards committees.
>>
>> BTW, C99's fmax prefers numbers as well.
>>
>> And I write:
>>> A global flag that is not locally scoped and can be forgotten is
>>> downright dangerous in this context.  Such a flag will wander into
>>> code where it was not intended and wreck havoc.  Please use another
>>> method.
>>
>> To which Alois Schlögl responds:
>>> (1) In order to avoid wandering of the flag into code, I've added the
>>> following warning:
>>> "warning: flag_implicit_skipnan(0): You are warned!!! You have turned
>>> off skipping NaN in sumskipnan. This is not recommended. Make sure you
>>> really know what you do."
>>> Is this sufficient to address your concern ?
>>
>> Historically, these warnings are ignored.  Everything short of a
>> flat-out abort often is ignored.  (Bitter?  Me?)
>>
>>> (5) It's also ok to leave out sumskipnan() from octave. Those who are
>>> interested have several options to install it:
>>
>> Ah, I thought the conversation was about the default behavior of
>> the statistical functions.
>
>
> The original intention for including the list of octave-maintainers, was
> to discuss whether sumskipnan() could be included in standard Octave.
> Because sumskipnan() might be beneficial not only to the NaN-toolbox but
> also to some traditional functions like nanmean(), nanvar(), nanstd(),
> etc. (as well es for imputation methods, etc.). Including sumskipnan in
> Octave does not prejudge whether the traditional statistical functions
> or the NaN-toolbox is used.
>

The idea of having something like sumskipnan is not bad in principle;
however, sumskipnan can't be included in the present state because it
does not even approximately match Octave's coding standards - though
you probably don't want to do this unless you have a reason, so that's
a bit circular.
Also, I'm not sure we even want MEX files to be in core Octave.

I'm also not really sure that using uncentered sum of squares is a
good way to approach std - when the errors are small, and I think
that's quite common, it falls back to Octave's algorithm anyway.

>
>>
>> This will take more time for thought.  I'm thinking from an
>> LAPACK/BLAS perspective; if there are enough uses (and I agree
>> there are), it might be worth including NaN-skipping routines in
>> LAPACK and the BLAS.
>
>
> I'd really like to see this, and there is a need for it. Within the
> NaN-toolbox, I'm using the covm() to estimate covariance matrices while
> skipping NaN's. LAPACK/BLAS support could really improve the performance
> on this too.
>
>
>>
>> We already take care not to pick NaNs as pivots in factorization;
>> perhaps we should have pushed the NaN check down into IxAMAX.  And
>> NaNs in the scaled vector 2-norm computations pose an interesting
>> performance problem, now that I think of it.
>>
>> One issue I haven't seen addressed is the cross-platform
>> performance of these routines.  I've seen Jaroslav Hajek's timings,
>> but that's only one platform.  Arithmetic with exceptional values
>> can see a >100x slow-down, e.g. on Pentium-4s with earlier
>> generations of their Netburst µarchitecture.  So skipping NaNs can
>> be *faster* if the test can avoid triggering the jump to microcode.
>>
>> There's a good implementation trick to smooth out some of the
>> cross-platform issues.  Compute the sum (or whichever) in segments.
>> Check each segment for a NaN.  If you're skipping NaNs, go back and
>> recompute for that segment.  If not, just return the NaN.
>>
>> At first look, you test for NaNs one-by-one.  Also, you use a
>> dangerous NaN check...  Some compilers *still* optimize x==x.  I'd
>> recommend using C99's isnan().
>
>
> I've replaced (x==x) with (!isnan(x)).
>

Using C99 functions directly is not acceptable for Octave, because we
can't rely on them to be available. The best way to check for NaN
inside Octave's code is probably "xisnan" from "lo-mappers.h".


> Concering your suggestion on segmentation-based checks, I donot know how
> big of a problem this really is. I guess, I'd not be able to identify
> such a platform even if I've access to one. However, if you know how to
> improve sumskipnan() further (e.g. for specific platforms), lets do it.
>
>
> Another issue for a possible improvement on the performance, is the use
> of extended accuracy. This would enable the computation of the
> var/cov/std/zscore/cor/corrcoef in a single path without reduced accuracy.
>
>
>>
>> I'll look more at your toolbox in the future.  Obviously, it
>> tickles some of my research interests.  ;)
>>
>> Jason
>
>
> Good, I think there is a real need to address these issues. Research on
> NaN's and its applications have been neglected far to long.
>
>
> Alois
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAknIvecACgkQzSlbmAlvEIhvTgCeKw/KuJ2I/mZY+byEY7GH8toZ
> 8CIAoJDjbp11r2eYXqyr7akMp1XkiX8w
> =J8jQ
> -----END PGP SIGNATURE-----
>



-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
[Prev in Thread]
Current Thread
[Next in Thread]
Re: NaN-toolbox much faster now, (continued)
Prev by Date: Re: NaN-toolbox much faster now
Next by Date: Re: GSoC project: new graphics backend
Previous by thread: Re: NaN-toolbox much faster now
Next by thread: Re: NaN-toolbox much faster now
Index(es):
- Date
- Thread