octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Add workaround for broken xGELSD workspace queries.


From: Jason Riedy
Subject: [PATCH] Add workaround for broken xGELSD workspace queries.
Date: Tue, 12 Feb 2008 13:03:42 -0800
User-agent: Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.60 (gnu/linux)

The xGELSD workspace queries are inconsistant with the logic inside
those routines in the short and fat case (m > n).  For that case,
calculate a workspace size large enough to enable the efficient
algorithm.  This may allocate more memory than necessary for small
cases.  The calculated alternate size is at least the documented
minimum workspace requirement, so this amount always will work even
if the routines change.
---
   Sorry it's been so long.  I don't want to describe the LAPACK
   developer dynamic, but it's severely busted.  ergh.  I think
   this is a minimal change, and the ILAENV declaration should
   work on every Fortran compiler I know. -- Jason

 liboctave/CMatrix.cc |   29 +++++++++++++++++++++++++++--
 liboctave/ChangeLog  |   11 +++++++++++
 liboctave/dMatrix.cc |   32 ++++++++++++++++++++++++++++++--
 3 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/liboctave/CMatrix.cc b/liboctave/CMatrix.cc
index 5d6131a..226608d 100644
--- a/liboctave/CMatrix.cc
+++ b/liboctave/CMatrix.cc
@@ -63,6 +63,13 @@ along with Octave; see the file COPYING.  If not, see
 
 extern "C"
 {
+  octave_idx_type
+  F77_FUNC (ilaenv, ILAENV) (const octave_idx_type&, F77_CONST_CHAR_ARG_DECL,
+                            F77_CONST_CHAR_ARG_DECL,
+                            const octave_idx_type&, const octave_idx_type&,
+                            const octave_idx_type&, const octave_idx_type&
+                            F77_CHAR_ARG_LEN_DECL F77_CHAR_ARG_LEN_DECL);
+
   F77_RET_T
   F77_FUNC (zgebal, ZGEBAL) (F77_CONST_CHAR_ARG_DECL,
                             const octave_idx_type&, Complex*, const 
octave_idx_type&, octave_idx_type&,
@@ -2492,8 +2499,12 @@ ComplexMatrix::lssolve (const ComplexMatrix& b, 
octave_idx_type& info,
 
       Array<Complex> work (1);
 
-      // FIXME: Can SMLSIZ be other than 25?
-      octave_idx_type smlsiz = 25;
+      const octave_idx_type smlsiz
+       = F77_FUNC (ilaenv, ILAENV) (9, F77_CONST_CHAR_ARG2 ("ZGELSD", 6),
+                                    F77_CONST_CHAR_ARG2 (" ", 1),
+                                    0, 0, 0, 0
+                                    F77_CHAR_ARG_LEN (6)
+                                    F77_CHAR_ARG_LEN (1));
 
       // We compute the size of rwork and iwork because ZGELSD in
       // older versions of LAPACK does not return them on a query
@@ -2526,6 +2537,20 @@ ComplexMatrix::lssolve (const ComplexMatrix& b, 
octave_idx_type& info,
                                 ps, rcond, rank, work.fortran_vec (),
                                 lwork, prwork, piwork, info));
 
+      // The workspace query is broken in at least LAPACK 3.0.0
+      // through 3.1.1 when n > m.  The obtuse formula below
+      // should provide sufficient workspace for DGELSD to operate
+      // efficiently.
+      if (n > m)
+       {
+         octave_idx_type addend = m;
+         if (2*m-4 > addend) addend = 2*m-4;
+         if (nrhs > addend) addend = nrhs;
+         if (n-3*m > addend) addend = n-3*m;
+         const octave_idx_type lworkaround = 4*m + m*m + addend;
+         if (std::real (work(0)) < lworkaround) work(0) = lworkaround;
+       }
+
       if (f77_exception_encountered)
        (*current_liboctave_error_handler) 
          ("unrecoverable error in zgelsd");
diff --git a/liboctave/ChangeLog b/liboctave/ChangeLog
index 90e6b39..24b8bf7 100644
--- a/liboctave/ChangeLog
+++ b/liboctave/ChangeLog
@@ -1,5 +1,16 @@
 2008-02-12  Jason Riedy  <address@hidden>
 
+       * dMatrix.cc (ILAENV): Declare LAPACK Fortran function.
+       (Matrix::lssolve): Use ILAENV to query smlsiz.  And add an ugly
+       workaround for DGELSD's broken lwork query.  The formula is from
+       LAPACK's dgelsd.f source and allocates enough workspace to use an
+       efficient algorithm in the short-and-fat case (n > m).
+       * CMatrix.cc (ILAENV): Declare LAPACK Fortran function.
+       (ComplexMatrix::lssolve): Use ILAENV to query smlsiz.  And add an
+       ugly workaround for DGELSD's broken lwork query, as with double.
+
+2008-02-12  Jason Riedy  <address@hidden>
+
        * oct-sort.cc: Include cstring and import memcpy and memmove.
 
 2008-02-12  Jason Riedy  <address@hidden>
diff --git a/liboctave/dMatrix.cc b/liboctave/dMatrix.cc
index d8a3551..4c58c6c 100644
--- a/liboctave/dMatrix.cc
+++ b/liboctave/dMatrix.cc
@@ -59,6 +59,13 @@ along with Octave; see the file COPYING.  If not, see
 
 extern "C"
 {
+  octave_idx_type
+  F77_FUNC (ilaenv, ILAENV) (const octave_idx_type&, F77_CONST_CHAR_ARG_DECL,
+                            F77_CONST_CHAR_ARG_DECL,
+                            const octave_idx_type&, const octave_idx_type&,
+                            const octave_idx_type&, const octave_idx_type&
+                            F77_CHAR_ARG_LEN_DECL F77_CHAR_ARG_LEN_DECL);
+
   F77_RET_T
   F77_FUNC (dgebal, DGEBAL) (F77_CONST_CHAR_ARG_DECL,
                             const octave_idx_type&, double*, const 
octave_idx_type&, octave_idx_type&,
@@ -2103,8 +2110,12 @@ Matrix::lssolve (const Matrix& b, octave_idx_type& info,
 
       Array<double> work (1);
 
-      // FIXME: Can SMLSIZ be other than 25?
-      octave_idx_type smlsiz = 25;
+      const octave_idx_type smlsiz
+       = F77_FUNC (ilaenv, ILAENV) (9, F77_CONST_CHAR_ARG2 ("DGELSD", 6),
+                                    F77_CONST_CHAR_ARG2 (" ", 1),
+                                    0, 0, 0, 0
+                                    F77_CHAR_ARG_LEN (6)
+                                    F77_CHAR_ARG_LEN (1));
 
       // We compute the size of iwork because DGELSD in older versions
       // of LAPACK does not return it on a query call.
@@ -2129,6 +2140,23 @@ Matrix::lssolve (const Matrix& b, octave_idx_type& info,
                                 ps, rcond, rank, work.fortran_vec (),
                                 lwork, piwork, info));
 
+      // The workspace query is broken in at least LAPACK 3.0.0
+      // through 3.1.1 when n > m.  The obtuse formula below
+      // should provide sufficient workspace for DGELSD to operate
+      // efficiently.
+      if (n > m)
+       {
+         const octave_idx_type wlalsd
+           = 9*m + 2*m*smlsiz + 8*m*nlvl + m*nrhs + (smlsiz+1)*(smlsiz+1);
+         octave_idx_type addend = m;
+         if (2*m-4 > addend) addend = 2*m-4;
+         if (nrhs > addend) addend = nrhs;
+         if (n-3*m > addend) addend = n-3*m;
+         if (wlalsd > addend) addend = wlalsd;
+         const octave_idx_type lworkaround = 4*m + m*m + addend;
+         if (work(0) < lworkaround) work(0) = lworkaround;
+       }
+
       if (f77_exception_encountered)
        (*current_liboctave_error_handler) 
          ("unrecoverable error in dgelsd");
-- 
1.5.4.1



reply via email to

[Prev in Thread] Current Thread [Next in Thread]