help-glpk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-glpk] Optimization and Multicore GLPK


From: glpk xypron
Subject: Re: [Help-glpk] Optimization and Multicore GLPK
Date: Tue, 25 Dec 2012 08:09:24 +0100

Hello Reg,

== Profiling ==

Profiling GLPK could give very valuable ideas how to speed up the code.

Oprofile is a useful tool for doing the work. See
http://oprofile.sourceforge.net

== Parallelization ==

Before thinking about parellization it would be necessary to attack the very
same design decisions that make GLPK not thread safe: memory management and
error handling.

When it comes to question whether parallization or optimiation of the single
thread code is more valuable I must admit that I do not care as long as I can
get back to the entry prompt faster. So if parallelization can give me factor
4 and a better algorithm factor 25, I would be happy to get both and solve
my problem in 1 % of the time.

There are different levels where parallelization might be considered for GLPK.

One place is the simplex algorithm itself. An overview can be found in
http://www.maths.ed.ac.uk/hall/ParSimplexRevised/ParSimplexApril2007.pdf
One direction of research not covered here are attempts to run the 
Simplex algorithm on GPUs.

Another place for parallelization is the branch and bound algorithm as
described in
http://web.ist.utl.pt/~ist11038/compute/_parallel_comp/Montana2007ParalMILP.pdf
http://www.cc.gatech.edu/~bader/papers/ParallelBranchBound.pdf

My impression is that parallelization of branch and bound in GLPK would require
significantly less programming resources.

Best regards

Heinrich Schuchardt

-------- Original-Nachricht --------
> Datum: Mon, 24 Dec 2012 17:37:22 -0800 (PST)
> Betreff: [Help-glpk] Optimization and Multicore GLPK

> Here are a few comments on optimization and multicore GLPK.  A search
> online for information about optimizing GLPK only turned up xypron's recent
> post about enabling SSE and fastmath.  However, GLPK is very sophisticated,
> so I what I have to say may fall in the "not documented yet" category. 
> However, except for Maros, I've seen little mention of performance profiling
> and coverage analysis in the LP literature I've read.  Most of the
> performance metrics are limited to iteration counts and elapsed time.   
> Serious
> code optimization requires much finer granularity of analysis.
> 
> I have just completed an initial, very quick read of "Computational
> Techniques of the Simplex Method" by Istvan Maros.  For anyone interested in
> working on LP codes, I think it is a necessary, but not sufficient reference.
> Maros gives a good overview of many issues related to implementing a
> simplex code and to large scale solvers in general.
> 
> It was typeset by the author so there are typos and non-native speaker
> errors to contend with.  However, all the important ideas are accompanied by
> numerous citations, so with the addition of the appropriate references, it
> appears to provide a good map of the terrain.  I qualify my statement only
> because I am far too ignorant of LP to make a stronger statement.
> 
> I agree entirely with Andrew's comments previously on the topic of
> parallelization of GLPK. There is much work to be done on the serial 
> algorithms
> before it makes any sense to attempt to implement parallel execution.  Aside
> from the computational cost of pricing, the particular choice of pricing
> rule can strongly influence solution times for particular problems.  So
> implementing Devex and other rules seems to me a good starting point.  I'm
> sure that other opportunities to improve single core performance exist, but
> this seems a particularly obvious place to start.
> 
> That said, barrier synchronized shared memory parallel programming should
> be suitable for use in GLPK running on multicore workstations. Actual
> performance is highly dependent upon cache architecture and organization, so 
> the
> benefit cannot be predicted easily.  Considerable experimentation and
> deep knowledge of the hardware behavior is required to make this work well. 
> The only reason it is worth considering at all is that a single
> architecture dominates the computing landscape.  Intel is unlikely to do 
> something
> that breaks a lot of major codes and core counts are likely to continue to
> grow.
> 
> The concept is simple.  In sections which are not execution order
> dependent, a small number of coroutines are started by a signal.  These
> coroutines then run to completion at which time they send a signal to the main
> routine which decrements a counter.  When all the coroutines have completed,
> the counter decrements to zero and the main routine continues.  If one can
> avoid thrashing due to cache collisions, those portions of the code can be
> speeded up by the number of cores employed.  Of course, total speedup will
> be much less as dictated by Amdahl's law.
> 
> In looking at some of the source code for simplex, I see places where
> there is a loop over vector operations. The natural thing to do for these is 
> to
> use the SSE instructions to speed up the vector operations and spread the
> vectors over multiple cores.
> 
> Though conceptually simple, none of this is easy to do.  In addition,
> GLPK is easily the most complex code I've ever looked at in 20+ years of
> working on several million lines of scientific codes.  Fortunately, it's also
> hands down the best written large code I've ever seen which is a real
> delight.
> 
> Any optimization work on a code as sophisticated as GLPK is a major
> undertaking which will take a long time to execute.  The first step is 
> profiling
> and coverage analysis.
> 
> If there is sufficient interest in this subject, I propose to implement
> and document performance profiling and coverage analysis of GLPK in the
> wikibook using the various Netlib problem sets.  This will be for convenience
> restricted to the *nix environment.  However, it should generally be
> applicable to Windows if a *nix environment package such as Cygwin is used.  
> 
> I am particularly interested in comments from Andrew, Marc, Robbie and
> xypron.  All have been very generous to me in different ways and this is an
> attempt to repay them.  I come from a seismic processing background where run
> times are measured in tens of thousands of hours for a single dataset. 
> Fortunately, most seismic codes are trivially parallel. So a few hundred quad
> core nodes and a couple of weeks will get the job done.  Were that not the
> case, oil would be a lot more expensive than it is.  But we still do a lot
> of work to make the codes run faster.
> 
> Have Fun!
> Reg




reply via email to

[Prev in Thread] Current Thread [Next in Thread]