octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #31479] Crash & bugs in eigs


From: Rik
Subject: [Octave-bug-tracker] [bug #31479] Crash & bugs in eigs
Date: Fri, 03 Dec 2010 18:32:59 +0000
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Ubuntu/9.10 (karmic) Firefox/3.6.12

Follow-up Comment #4, bug #31479 (project octave):

I spent some time over the Thanksgiving holiday looking into this issue.  I
have 3 possible solutions, but each one has some disadvantages.

The problem is in the Fortran code for dneupd.f.  The C++ code passes in
three arrays which are sized based on the number of requested eigenvalues (k)
+ 1.  The extra 1 accounts for the fact that sometimes a complex conjugate
pair will be found and so precisely one extra eigenvalue will be returned.  

Within the dneupd code the variable in question is nconv, which is the number
of converged Ritz values.  In general, it is equal to k, but not always.  The
call which modifies it is a call to dtrsen.   The documentation for dtrsen
says that the modified value for k is in the range 0 <= k <= p (the number of
Lanzcos basis vectors).  For the example in question, p = 3, and occasionally
it happens that nconv reaches this value whereas k + 1 = 2 and a segmentation
violation results.  

It appears, from my reading of the dneupd code, that the authors may not have
realized that nconv was both an input and output from dtrsen.  By replacing
nconv with numcnv (an existing temporary variable containing the number of
converged Ritz values) the program no longer segfaults.  This is a one-line
fix, but there are several problems with it.  First, it is a fix in a code
base we don't control and getting someone to look at this issue, understand
it, and make the change may take quite some time.  Second, although I reviewed
the code and it produces the correct eigenvalues, I can't say that this change
wouldn't have other effects on the algorithm.  (I really don't think it does,
but who knows?).

If solution 1, fixing ARPACK, is not the right course then we can handle it
on the Octave-side by re-dimensioning everything to be large enough to avoid
memory overwrites.  The three arrays to consider are dr, di, and z.  These are
all double (8 byte) arrays.  dr and di are of size (k + 1) while z is n * (k +
1).  These would now have to be sized based on p which defaults in Octave code
to 2 * k.  The extra memory requirements are (k - 1) for both dr and di, but n
* (k-1) * 8 bytes for z.  When used with true sparse matrices n (# of rows in
matrix) might be very large indeed (A brief look at the ARPACK web page showed
examples where n was 90,000 and 2.2e6).  So, while this solution is code
correct, it suffers from excess memory usage.  To be fair, if you're running
the extremely large simulations your machine or cluster probably has a lot of
memory.  Even the large simulation I cited would need only an additional 176
MB of memory when calculating 10 eigenvalues.

Solution 3 is a modification of the previous solution, re-dimension the
arrays, but only to size k + 2 rather than to size p.  I have not found a
situation where nconv exceeds k + 2.  Admittedly, I have tried only a few
matrices.  If I understood the entire ARPACK algorithm better one might be
able to assert that nconv will never exceed k + 2.  It seems possible that
there is another +1 limit on the change in k.  So, the first call to dnaupd
increases k to k+1.  The second call in the code to dneupd then increases k+1
to k+2, but not more than that.

Any votes or words of advice for which solution to implement?


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?31479>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]