help-glpk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-glpk] Optimization and Multicore GLPK


From: Reginald Beardsley
Subject: Re: [Help-glpk] Optimization and Multicore GLPK
Date: Thu, 27 Dec 2012 07:56:10 -0800 (PST)

Thanks to all for the links to papers.

A parallel implementation of either the primal or dual simplex method is a 
difficult research topic and not something I have nor had any motivation to 
undertake.

I'm interested in the optimization of the existing serial revised simplex 
algorithm with particular attention to low effort performance improvements.  
This leads to focusing on matrix multiply. Row - column operations are 
trivially data parallel.  One can spread the problem over multiple cores by 
using modulo arithmetic to select the rows processed by each core.  In a shared 
memory machine there is very low overhead for doing this.  It also does not 
require making GLPK threadsafe.  I've not counted how many functions would be 
affected by doing this, but it is very small and the changes are relatively 
simple.  

For dense rows the operation should be vectorized so that the SSE extensions 
can be used to full effect.  The effort required is modest.  As I recall 
padding the arrays will suffice, though compilers & hardware may take care of 
that now.

For hyper sparse rows the use of a sparse multiply algorithm will outperform 
the vectorized version.  So some logic is required to select which of the two 
methods is used and to keep track of the row density.

The sort of multicore operation I'm suggesting will eventually be done 
automatically by the compiler.  At present it's being done on a per core basis 
to schedule multiple FPUs.  It may well already be available in some of the 
commercial HPC compilers, though with some restrictions.

Attached processors such as the GPUs have been popular at various times.  While 
they offer significant speedups, it comes at a price.  Whether they are 
worthwhile depends upon the ratio of communication to computation time.  They 
often demand significant code changes to use effectively.  Historically, the 
CPU has caught up and the labor of coding to the AP has been rendered useless 
or even harmful.

If the code accounting for 40% of the execution time can be speeded up by a 
factor of 4x, the total improvement will be a less than 30% reduction in 
execution time.  I'd be delighted if the combination of vectorization, sparse 
arithmetic and multicore operation did better than that, but it's very hard to 
make large performance improvements in good codes.  Most of the time 
improvements to one aspect of a code result in a bottleneck somewhere else.  
For large LP problems I would expect cache & main memory behavior to limit the 
gains from faster arithmetic.

Have Fun!
Reg




reply via email to

[Prev in Thread] Current Thread [Next in Thread]