octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 2nd: sparse matrices


From: john popeye
Subject: Re: 2nd: sparse matrices
Date: Mon, 26 Apr 2004 12:42:02 -0500

hi andy,

see below

> > Summary:
> > Sparse matrices are a special case of the
> mathematical
> > object "matrix". This special case was introduced
> due
> > to the simple fact that you can be orders of
> magnitude
> > more efficient when you apply certain restrictions
> on
> > it.
> > My recommendation is implementing such a feature
> makes
> >  only sense if you do not restrict yourself by the
> way
> > you implement it. Focus on what you could achieve
> with
> > it and for what is was designed (LARGEness). (you
> do
> > not buy an expensive car when all you do is moving
> it
> > forward and backward in the garage a few meters,
> do
> > you?)
> 
> 
> Do you actually USE sparse matrices?

Of course I do, but due to the arguments given
in my first email I do NOT use matlab for it and also
not octave's approach. I tried both and got stuck.
 
> The Matlab sparse implementation (most of which is
> offered by octave-forge) gives almost exactly what
> you want, as well as some special features to do
> things on sparse that you don't normally do on
> full matrices.

I do not speak of the operations you could do
with the matrices, I talk about the size and amount
of matrices you can handle. The key thing 
of sparse matrices is, that you use the approach
because the objects are too large to handle them
like full matrices with tons of 0's. 
Well correct me if I am wrong, but does
matlab/octave handle the stuff "out of core",
speaking they do not hold the matrix in main memory?
The last time I tried it, honestly at least 3 years
ago this was not the fact.

> Summary:
>     What's wrong with the Matlab approach and what
>     do you propose we do differntly?

matlab has some phantastic features, no question about
that. that is why others try in many places to mimic
it and in some areas even have a better solution ;-)

According to functionality matlab is a very rich 
package:
-numerics
-graphcis
-gui
-maybe some symbolic etc.
-other stuff
that's the reason why many 3rd party "toolboxes" 
exist which use these features and build other
stuff on it.
I think I do not need to explain these benefits.

Let's come to the bottlenecks of matlab. 
In it's very basic setup matlab limits itself
to the available main memory. Once all the
data you have fills up that memory this means
 "GAME OVER". It is extremely ugly to insert
in all your scripts/m-functions or  whatever routines
which start saving/loading.
In my eyes or lets
say in my area of application this is a killer.
Even if main memory is not that expansive any more
then it was, it is and will always be much more
expensive then disk space.
I would guess that 300GB of diskspace cost about
as much as 1GB of main memory, if it does.

That means that the code limits itself of handling
problems of a size 300 times smaller then it could 
with a proper architecture (we even do not talk about
32bit problems and int4 adressing limits).

Example with full matrices for simplicity: 
I have 1.2GB of main memory and want to calculate
C=A+B;   # do not forget the ";" ;-)
this means each matrix could be 400MB which means lets
say sqrt(400e6/8)=7071
You might say this is large, but large is always
relative.
Assume now you have an "out of core" approach:
For the user it does not matter where the data is.
For him is important: "what can I do with it, and
how fast is it".
In the " out of core" approach I would store the
following information in memory: 
name,n-columns,n-rows,complexity,symmetry,some-other-stuff
    

for each matrix. This is can be done with 100 bytes
easily. So we have another 1.2GB main memory for
doing the math. Now we fread  chunks of 
400mb   matrix "A" and "B", add it up and
fwrite 400mb chunks of C. This leads to as many
freads as we need to "eat-up" the matrices. 
You agree that I will be able to add as large
and as many matrices which fit on my disk.

For other operations and especially for sparse
matrices
the chunk size is in general multiples of the row
size, if it is stored in Harwell-Boeing compressed
column format.
As many operations use square matrices we can use
as a thumb rule that we can solve problems where
one column fits in memory. Going back to our
1.2GB this means: 1.2E9/8=1.5e8 ROWS for real
and 7.5E7 ROWS for complex matrices (some factors for
addtional input and output granted). Regardless how
many additional matrices I have, due to the fact
that they all sleep on disc.
You must admit that even in the worst case the "out of
core" can solve significantly larger problems.
The key disadvantages are:
-you need some more overhead in designing such a
system
-basically it is slower, which can be compensated
with:
+ more expensive disk system
+ automatic pre/post-caching of the stuff you need
(as long as you handle data which you can afford
to have in memory (as all problems now ;-))
the disk space is just a back-up for a possible crash
and the user should have almost identical speed)

However this is a way to solve even the largest
problems with moderate investments in hardware.

best regards, jp






        

        
                
Mit schönen Grüßen von Yahoo! Mail - http://mail.yahoo.de



reply via email to

[Prev in Thread] Current Thread [Next in Thread]