bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: glibc segfault on "special" long double values is _ok_!?


From: Jan-Benedict Glaw
Subject: Re: glibc segfault on "special" long double values is _ok_!?
Date: Fri, 8 Jun 2007 11:09:38 +0200
User-agent: Mutt/1.5.13 (2006-08-11)

On Fri, 2007-06-08 09:53:24 +0100, James Youngman <address@hidden> wrote:
> On 6/8/07, Nix <address@hidden> wrote:
> > It's somewhat unusual for applications to accept double-format data over
> > the network or from files; but modulo byte-swapping, has anyone *ever*
> > seen an application that checks to be sure that the data it's received
> > is a valid IEEE754 floating-point number? I've never seen any such app,
> > I've never heard of anyone taking precautions under the assumption that
> > a double with a one-bit error (I think it's one bit, I've lost the start
> > of this thread) may cause core dumps if printed, and I've never
> > considered doing any such thing myself. It's generally assumed that
> > printing doubles is safe, no matter their origin.
> 
> The use case I was thinking about as I wrote my earlier email is of
> massively parallel HPC across large compute clusters.  Here is the
> basic approach:
> 
> 1. Buy some 500-port, very high bandwidth, medium latency, network
> switches (Myrinet, Gig-Ethernet, whatever).
> 2. Plug a big pile of machines in.
> 3. Perform gigantic parallel calculations
> 4. Exchange numeric data between nodes during the computation
> 
> So, having spent the $xxM on the gigantic switches, to get better
> aggregate bandwidth, do we prefer to format the data as ASCII before
> we exchange it between nodes?   Not really.   Do we format the data as
> ASCII before we store the end result of the computation?   You bet,
> but that's a different issue.

In this setup, you control all the cluster and you can ensure that all
nodes use the same hardware and that no node will send data over the
network that wasn't the result of CPU calculation.

In the ticket, the case was different in that he got data fed in that
most probably was _not_ the result of a calculation done by the CPU,
but hand-craftes.

This won't happen in your controlled cluster.

> Can the network infrastructure corrupt bits in the exchanged data?
> Yes.  Not often, but it does happen.  Same for the RAM.  So what do we
> do when we detect a problem?  Print debugging messages, as Nix already

Stop.  Would you continue with known-wrong data once detected?

> said (we work in, afaik, unrelated organisations).   Obviously some of
> the diagnostics only get issued when we already know there is a
> problem.   When we're producing diagnostics, we prefer that the bad
> data we're trying to complain about can be logged somehow.

hexdump (&my_long_double, sizeof my_long_double());
kill (getpid (), SIGABRT);

That way, you get a nice core dump and can call GDB with it. With
"clean" floats, just use GDB's "print" to print it (or even call
printf() with it.)

That may crash, too, upon one of these badly-crafted floats, but then
you still have the core dump to dissect it bit by bit.

> Could we just print the raw bytes as hex or something?  Sure, but then
> we'd need to interpret that anyway.  The days of manually poring over
> core dumps that came out of the line printer shuld be behind us these
> days.

Once you detected madness somewhere in your data, be sceptic with it.

> > I'd say this behaviour violates the principle of least astonishment, at
> > least. Mind you, avoiding it does seem like it could be expensive: [...]
> 
> Maybe.  For the issue-diagnostic-message use case, performance is not
> such an issue.  But I'm sure there are valid use cases where ultimate
> performance is really vital.  Use-cases vary a lot.

You can fully control your cluster, but in the case discussed here,
the data was injected by a non-controlled source.

MfG, JBG

-- 
      Jan-Benedict Glaw      address@hidden              +49-172-7608481
Signature of:                   ...und wenn Du denkst, es geht nicht mehr,
the second  :                          kommt irgendwo ein Lichtlein her.

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]