octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 2nd: sparse matrices


From: Etienne Grossmann
Subject: Re: 2nd: sparse matrices
Date: Mon, 26 Apr 2004 20:03:27 -0400
User-agent: Mutt/1.4.2.1i

  Hi all,

I have been atracted to this thread because I myself have sometimes
slowness probs w/ matrices w/ ~5e7 doubles.

  My ignoravimus' question is : doesn't the OS (Linux) use its swap
disk space to allocate space beyond the memory capacity of the
computer it is running on? Of course, it will swap data back and
forth, not necessarily as efficiently as if I moved the data myself,
but isn't the idea of swap that it should be done by the OS. Iirc, VMS
had a reputation for good virtual memory management.

  About the sparse package in general, its point is, iigc, to avoid
both storing and adding/multiplying/whatever lots of zeros. As the
name 'sparse' says, it is for matrices w/ few nonzeros, not
necessarily 'huge'.

  Etienne


On Mon, Apr 26, 2004 at 12:42:02PM -0500, john popeye wrote:
# hi andy,
# 
# see below
# 
# > > Summary:
# > > Sparse matrices are a special case of the
# > mathematical
# > > object "matrix". This special case was introduced
# > due
# > > to the simple fact that you can be orders of
# > magnitude
# > > more efficient when you apply certain restrictions
# > on
# > > it.
# > > My recommendation is implementing such a feature
# > makes
# > >  only sense if you do not restrict yourself by the
# > way
# > > you implement it. Focus on what you could achieve
# > with
# > > it and for what is was designed (LARGEness). (you
# > do
# > > not buy an expensive car when all you do is moving
# > it
# > > forward and backward in the garage a few meters,
# > do
# > > you?)
# > 
# > 
# > Do you actually USE sparse matrices?
# 
# Of course I do, but due to the arguments given
# in my first email I do NOT use matlab for it and also
# not octave's approach. I tried both and got stuck.
#  
# > The Matlab sparse implementation (most of which is
# > offered by octave-forge) gives almost exactly what
# > you want, as well as some special features to do
# > things on sparse that you don't normally do on
# > full matrices.
# 
# I do not speak of the operations you could do
# with the matrices, I talk about the size and amount
# of matrices you can handle. The key thing 
# of sparse matrices is, that you use the approach
# because the objects are too large to handle them
# like full matrices with tons of 0's. 
# Well correct me if I am wrong, but does
# matlab/octave handle the stuff "out of core",
# speaking they do not hold the matrix in main memory?
# The last time I tried it, honestly at least 3 years
# ago this was not the fact.
# 
# > Summary:
# >     What's wrong with the Matlab approach and what
# >     do you propose we do differntly?
# 
# matlab has some phantastic features, no question about
# that. that is why others try in many places to mimic
# it and in some areas even have a better solution ;-)
# 
# According to functionality matlab is a very rich 
# package:
# -numerics
# -graphcis
# -gui
# -maybe some symbolic etc.
# -other stuff
# that's the reason why many 3rd party "toolboxes" 
# exist which use these features and build other
# stuff on it.
# I think I do not need to explain these benefits.
# 
# Let's come to the bottlenecks of matlab. 
# In it's very basic setup matlab limits itself
# to the available main memory. Once all the
# data you have fills up that memory this means
#  "GAME OVER". It is extremely ugly to insert
# in all your scripts/m-functions or  whatever routines
# which start saving/loading.
# In my eyes or lets
# say in my area of application this is a killer.
# Even if main memory is not that expansive any more
# then it was, it is and will always be much more
# expensive then disk space.
# I would guess that 300GB of diskspace cost about
# as much as 1GB of main memory, if it does.
# 
# That means that the code limits itself of handling
# problems of a size 300 times smaller then it could 
# with a proper architecture (we even do not talk about
# 32bit problems and int4 adressing limits).
# 
# Example with full matrices for simplicity: 
# I have 1.2GB of main memory and want to calculate
# C=A+B;   # do not forget the ";" ;-)
# this means each matrix could be 400MB which means lets
# say sqrt(400e6/8)=7071
# You might say this is large, but large is always
# relative.
# Assume now you have an "out of core" approach:
# For the user it does not matter where the data is.
# For him is important: "what can I do with it, and
# how fast is it".
# In the " out of core" approach I would store the
# following information in memory: 
# name,n-columns,n-rows,complexity,symmetry,some-other-stuff
#     
# 
# for each matrix. This is can be done with 100 bytes
# easily. So we have another 1.2GB main memory for
# doing the math. Now we fread  chunks of 
# 400mb   matrix "A" and "B", add it up and
# fwrite 400mb chunks of C. This leads to as many
# freads as we need to "eat-up" the matrices. 
# You agree that I will be able to add as large
# and as many matrices which fit on my disk.
# 
# For other operations and especially for sparse
# matrices
# the chunk size is in general multiples of the row
# size, if it is stored in Harwell-Boeing compressed
# column format.
# As many operations use square matrices we can use
# as a thumb rule that we can solve problems where
# one column fits in memory. Going back to our
# 1.2GB this means: 1.2E9/8=1.5e8 ROWS for real
# and 7.5E7 ROWS for complex matrices (some factors for
# addtional input and output granted). Regardless how
# many additional matrices I have, due to the fact
# that they all sleep on disc.
# You must admit that even in the worst case the "out of
# core" can solve significantly larger problems.
# The key disadvantages are:
# -you need some more overhead in designing such a
# system
# -basically it is slower, which can be compensated
# with:
# + more expensive disk system
# + automatic pre/post-caching of the stuff you need
# (as long as you handle data which you can afford
# to have in memory (as all problems now ;-))
# the disk space is just a back-up for a possible crash
# and the user should have almost identical speed)
# 
# However this is a way to solve even the largest
# problems with moderate investments in hardware.
# 
# best regards, jp
# 
# 
# 
# 
# 
# 
#       
# 
#       
#               
# Mit schönen Grüßen von Yahoo! Mail - http://mail.yahoo.de
# 

-- 
Etienne Grossmann ------ http://www.cs.uky.edu/~etienne



reply via email to

[Prev in Thread] Current Thread [Next in Thread]