[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: CPU usage by call of C++ code through system() on Linux
From: |
Andreas Stahel |
Subject: |
Re: CPU usage by call of C++ code through system() on Linux |
Date: |
Mon, 29 Jun 2020 16:37:14 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 |
On 29.06.20 09:49, Kai Torben Ohlhus wrote:
On 6/26/20 3:58 PM, Andreas Stahel wrote:
On 6/26/20 6:28 AM, Kai Torben Ohlhus wrote:
On 6/26/20 1:17 AM, Andreas Stahel wrote:
Dear Octave Users
Maybe one of you can give me a hint on how to make my Octave code run
faster.
Within a good size program (run time 40 sec) the command system() is
used to call a C++ code.
The C++ code uses pthreads.
While the code is running htop show approximately 40% of load by the
kernel on each CPU and 60% "normal" (user space?).
When running the same code in Matlab only the "normal"load shows and
very little kernel load on the CPUs.
The computation time by Matlab is also only 60% of the time consumed by
Octave (5.2.0)
The system is an Ubuntu 20.04 on a AMD Ryzen 3950X.
Any hints on what is slowing Octave down?
With best regards
Andreas
Dear Andreas,
Maybe I do not understand your setup correctly. You have a C++ code
using threads compiled to, e.g. "code.exe" (the suffix does not matter),
and an Octave script "benchmark.m" with somewhere the code line
system ("code.exe")
First question is, do "benchmark.m" and "code.exe" interact with each
other? Means, does "code.exe" compute something that "benchmark.m"
processes further by importing results? What is the purpose of Octave
calling "code.exe"? Benchmarking with tic-toc?
Second question, does "code.exe" (standalone, without Octave or Matlab)
or "benchmark.m" (called from Octave or Matlab) have a run time of 40
seconds?
Now to your observation. When running "benchmark.m" in Octave and
Matlab you observe Octave is slower. I do not understand how this is
related to the CPU "kernel" and "normal" usage? What is the runtime of
"benchmark.m" in Matlab and Octave, respectively? Do you complain not
all CPU cores are used?
Maybe it is best to give us (some) code to better understand the
situation.
Kai
Dear Kai
Thank you for the quick reply and attempt to locate the problem.
The code in "benchmark.m" is a loop with 600 iterations.
In each iteration a C++ code is called through system().
The C++ code is heavily threaded, and using FFTW extensively. FFTW is
used as single thread library.
Thu multithreading is "hand coded"
I have two options set up
NumIter = 0, no FFT computations
NumIter = 2, many FFT computations
In addition I called the binary with a loop in bash.
These are the observed wall times, averaged for one call of the binary
– Octave NumIter=2 : 59.6 ms, NumIter=0 : 16.3 ms,
– MATLAB NumIter=2 : 38.3 ms, NumIter=0 : 20.1 ms,
– bash NumIter=2 : 37.9 ms, NumIter=0 : 19.2 ms,
This puzzles me thoroughly!
Andreas
PS. on nabble these messages show up in the wrong thread!
Dear Andreas,
The maintainers list was not in the CC. Sorry for the late reply.
I am still not really convinced, that I understand your setup and the
purpose of your computation.
Is there any output or synchronization between "code.exe" or
"benchmark.m"? The Octave interpreter interpreting a for-loop alone
consumes already "lots of time" compared to your fast overall
computation time.
a = 0; tic; for i = 1:600, a = a + i; end; toc
Octave 1.53995 ms.
Matlab 0.025 ms.
So maybe you just measure "slow" code interpretation when the body of
the for-loop is "heavier" than the one shown above?
Do you measure your wall time inside "code.exe" or in "benchmark.m" by
tic-toc, like in my example? Maybe you find no differences, if you use
a more precise C/C++ library to measure the wall time and return it for
further processing by Octave or Matlab?
Kai
Dear Kai
Thank you for your effort.
Here an attempt to clear up the situation.
The loop runs over 600 frames, the timings given as average per frame.
In the code "benchmark.m" the time per frame is measured by a
tic()/toc() pair.
tic();
system(command); %% this is where the computations are performed
systemtime = toc();
display(sprintf('time = %f',systemtime)) % to get an
impression while it is running
systemtimetotal = systemtimetotal+systemtime;
Based on your suggestion I added two system calls to gettimeofday() in
the C code.
The observed timing is consistent with the tic()/toc() result, i.e.
tic()/toc() slightly higher.
The C code was compiled with
gcc -O3 -Wall RunMultipleTH_z_Neumann2.c -lpthread -lm -lfftw3 -o
RunMultipleTH_z_Neumann2
"benchmark.m" and "code.exe" exchange some information through files.
I timed those file reads and writes, it uses very little time.
on a host with a Ryzen 3950X CPU
* running "code.exe" in a bash loop leads to 33 ms per frame
htop has almost all of the CPU load assigned to the user
* running the code in Octave leads to 59 ms per frame
htop has a sizable part of the CPU load assigned to kernel
* running the code in Matlab leads to 37 ms per frame
htop has almost all of the CPU load assigned to the user
If I reduce the FFTW computations withing "code.exe" Octave is faster than
bash or Matlab, but by very little. The multiple threads are still
launched within the C code,
but no FFT 2D operations applied.
On a host with a Intel Xeon E5-1650 CPU a similar effect occurs, not
quite as drastic
bash 80 ms
Matlab 99 ms
Octave 127 ms
I have no idea what could cause this surprising effect.
Enjoy the day
Andreas
--
Andreas.Stahel@gmx.com