|
From: | Juergen Sauermann |
Subject: | Re: [Bug-apl] 80 core performance results |
Date: | Fri, 22 Aug 2014 15:46:50 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.0 |
Hi Elias, I am working on it. As a preparation I have created a new command ]PSTAT that shows how many CPU cycles the different scalar function take. You can run the new workspace ScalarBenchmark_1.apl to see the results (SVN 444). These numbers are very imprtant to determine when to switch from sequential to parallel execution. The idea is to feed back these numbers into the interpreter so that machines can tune themselves. The other thing is the lesson learned from your benchmark results. As far as I can see, semaphores are far to slow for syncing the threads. The picture that is currently evolving in my head is this: Instead of 2 states (blocked on semaphore/running) of the threads there should be 3 states: 1. blocked on semaphore, 2. busy waiting on some flag in userspace, and 3. running (performing parallel computations) The transition between states 1 and 2 is somewhat slow, but only done when the interpreter is blocked on input from the user. The transition between 2 and 3 is much more lightweight so that the break-even point between sequential and parallel execution occurs at much shorter vector sizes. Since this involves some interaction with Input.cc I wasn't sure if I should first throw out libreadline (in order to simplify Input.cc) or to do the parallel stuff first. Another lesson from the benchmark was that OMP is always slower than the hand-crafted method, so I guess it is out of scope now, My long term plan for the next 1 or 2 releases is this: 1. remove libreadline 2. parallel for scalar functions 3. replace liblapack /// Jürgen On 08/22/2014 12:22 PM, Elias Mårtenson
wrote:
|
[Prev in Thread] | Current Thread | [Next in Thread] |