[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-glpk] Optimization and Multicore GLPK
From: |
Reginald Beardsley |
Subject: |
Re: [Help-glpk] Optimization and Multicore GLPK |
Date: |
Sat, 29 Dec 2012 07:27:47 -0800 (PST) |
I'd suggest just "Performance Profiling and Optimization" as a title for the
wiki chapter. Profiling is needed just for intelligently setting flags with
modern compilers and constantly shifting machine architectures.
I've had time to digest the papers, reread parts of Maros and look at GLPK and
my own problem.
A few comments:
Maros is very good. In particular, I'd like to direct attention to his
discussion of sparse arithmetic operations. What he suggests is dead simple to
implement and something I'd not seen before. For typical sparse problems it's
a big speedup. Much of his discussion about parallel computing is of little
interest to me, but for someone without a lot of prior background in parallel
computing it would be very beneficial even if a bit dated (ca. 2001)
All the papers address a point in time when shared memory multiprocessors were
being eclipsed by non-uniform access memory (NUMA) distributed machines. For
many reasons, most performance gains must be with NUMA machines. I suggest
"Computer Architecture: A Quantitative Approach" by Hennessy and Patterson to
any who are interested with the caveat that you'll need to know or learn a lot
about hardware to benefit. I plan to order the 5th edition. I read the first
3 when they came out, but missed the 4th. It's a constantly moving target.
The reemergence of shared memory symmetric multiprocessing (SMP) in the form of
multicore workstations with many gigabytes of memory makes some techniques that
had fallen into disfavor worth looking at again. Hall is correct about really
large problems, but most intermediate problems can benefit from exploiting SMP
provided due attention is paid to cache organization.
One quick comment about GPUs. Exploiting attached processors successfully
depends upon the ratio of communication and computation latencies. If the
labor of exploiting the AP is high, it's often not worth the trouble. The CPU
has caught up and passed the AP several times in the last 20 years. Seismic
processing codes are littered with coding artifacts from being ported to APs.
It can get really difficult to read the code. So I'd leave GPUs to those
interested in algorithm research. Fun stuff, but very difficult and demanding.
In reading the GLPK source code, it appears that the primal simplex solver is
doing dense arithmetic. If I've missed something, please let me know. There
are some sparse arithmetic codes, but I don't see them being used anywhere.
They also appear to be more complex than what Maros describes which is both
elegant and fast. For those solving large sparse problems with the simplex
method, there are substantial gains to be had.
In my own case I have a large number of dense problems to solve. So neither
sparse arithmetic nor multithreading will improve my time to solution. Running
multiple job queues as in the example I wrote for the wikibook is the best I
can do to exploit multiple cores in my work.
Have Fun!
Reg