help-glpk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-glpk] Optimization and Multicore GLPK


From: Reginald Beardsley
Subject: Re: [Help-glpk] Optimization and Multicore GLPK
Date: Sat, 29 Dec 2012 07:27:47 -0800 (PST)

I'd suggest just "Performance Profiling and Optimization" as a title for the 
wiki chapter.  Profiling is needed just for intelligently setting flags with 
modern compilers and  constantly shifting machine architectures.

I've had time to digest the papers, reread parts of Maros and look at GLPK and 
my own problem.

A few comments:

Maros is very good.  In particular, I'd like to direct attention to his 
discussion of sparse arithmetic operations.  What he suggests is dead simple to 
implement and something I'd not seen before.  For typical sparse problems it's 
a big speedup.  Much of his discussion about parallel computing is of little 
interest to me, but for someone without a lot of prior background in parallel 
computing it would be very beneficial even if a bit dated (ca. 2001)

All the papers address a point in time when shared memory multiprocessors were 
being eclipsed by non-uniform access memory (NUMA) distributed machines.  For 
many reasons, most performance gains must be with NUMA machines. I suggest 
"Computer Architecture: A Quantitative Approach" by Hennessy and Patterson to 
any who are interested with the caveat that you'll need to know or learn a lot 
about hardware to benefit.  I plan to order the 5th edition.  I read the first 
3 when they came out, but missed the 4th.  It's a constantly moving target.

The reemergence of shared memory symmetric multiprocessing (SMP) in the form of 
multicore workstations with many gigabytes of memory makes some techniques that 
had fallen into disfavor worth looking at again.  Hall is correct about really 
large problems, but most intermediate problems can benefit from exploiting SMP 
provided due attention is paid to cache organization.

One quick comment about GPUs.  Exploiting attached processors successfully 
depends upon the ratio of communication and computation latencies.  If the 
labor of exploiting the AP is high, it's often not worth the trouble.  The CPU 
has caught up and passed the AP several times in the last 20 years.  Seismic 
processing codes are littered with coding artifacts from being ported to APs.  
It can get really difficult to read the code.  So I'd leave GPUs to those 
interested in algorithm research.  Fun stuff, but very difficult and demanding.

In reading the GLPK source code, it appears that the primal simplex solver is 
doing dense arithmetic.  If I've missed something, please let me know.  There 
are some sparse arithmetic codes, but I don't see them being used anywhere.  
They also appear to be more complex than what Maros describes which is both 
elegant and fast.  For those solving large sparse problems with the simplex 
method, there are substantial gains to be had.

In my own case I have a large number of dense problems to solve.  So neither 
sparse arithmetic nor multithreading will improve my time to solution.  Running 
multiple job queues as in the example I wrote for the wikibook is the best I 
can do to exploit multiple cores in my work.

Have Fun!
Reg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]