[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization
From: |
Nicholas Jankowski |
Subject: |
[Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm |
Date: |
Wed, 30 Jun 2021 20:18:25 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 |
Follow-up Comment #18, bug #60818 (project octave):
looking at timing. It appears that Octave LU calls are just very computational
and memory expensive:
using the profiler to look at a 3D case:
Laplace Expansion:
>> rand("state", [1:625]');
>> x = rand(100,1)*4-2;y = rand(100,1)*4-2; z = rand(100,1)*4-2;
>> profile clear; profile on; for idx = 1:10000, delaunayn([x,y,z]);endfor,
profile off; profshow
# Function Attr Time (s) Time (%) Calls
------------------------------------------------------------
5 __delaunayn__ 13.772 78.08 10000
1 delaunayn 1.899 10.77 10000
13 cross 1.245 7.06 10000
12 binary - 0.088 0.50 60000
19 cat 0.088 0.50 10000
18 binary .* 0.087 0.50 80000
22 sqrt 0.080 0.45 30000
21 sumsq 0.070 0.40 30000
6 isa 0.044 0.25 10000
20 dot 0.035 0.20 10000
3 binary < 0.030 0.17 50000
9 size 0.024 0.14 30000
7 eps 0.023 0.13 10000
2 nargin 0.022 0.13 50001
15 ndims 0.022 0.12 30000
10 binary == 0.019 0.11 30000
16 ones 0.018 0.10 10000
23 binary ./ 0.017 0.10 10000
11 any 0.017 0.10 10000
24 abs 0.013 0.07 10000
using the LU decomposition:
>> clear all
>> rand("state", [1:625]');
>> x = rand(100,1)*4-2;y = rand(100,1)*4-2; z = rand(100,1)*4-2; p =
rand(100,1)*4-2;
q = rand(100,1)*4-2; r = rand(100,1)*4-2; s = rand(100,1)*4-2;
>> profile clear; profile on; for idx = 1:10000, delaunayn([x,y,z]);endfor,
profile of
f; profshow
# Function Attr Time (s) Time (%) Calls
------------------------------------------------------------
23 lu 14.207 40.10 10000
5 __delaunayn__ 12.965 36.59 10000
1 delaunayn 4.423 12.48 10000
27 unique 0.783 2.21 10000
15 kron 0.753 2.13 30000
22 logical 0.576 1.63 10000
17 speye 0.368 1.04 10000
16 sparse 0.218 0.62 20000
24 diag 0.184 0.52 10000
3 binary < 0.182 0.51 30000
25 abs 0.095 0.27 10000
13 binary - 0.078 0.22 70000
12 postfix .' 0.078 0.22 20000
26 max 0.073 0.21 10000
21 true 0.055 0.16 30000
7 eps 0.051 0.14 20000
6 isa 0.038 0.11 10000
36 zeros 0.036 0.10 10000
30 false 0.028 0.08 20001
14 ones 0.028 0.08 20000
so at least in 3D, for 100 points the LU approach is spending just as much
time in LU as in __delaunayn__, plus ~1/2 of __delaunayn__ again on other ops
within delaunayn. The Laplace expansion avoids most other function calls
except for cross, spending a total of ~1/3 of the time spent in
__delaunayn__.
jumping out to 6D:
LU:
>> clear all
>> x = rand(100,1)*4-2;y = rand(100,1)*4-2; z = rand(100,1)*4-2; p =
rand(100,1)*4-2;q = rand(100,1)*4-2; r = rand(100,1)*4-2;
>> profile clear; profile on; for idx = 1:100,
delaunayn([x,y,z,p,q,r]);endfor, profile off; profshow
# Function Attr Time (s) Time (%) Calls
------------------------------------------------------------
23 lu 26.310 47.92 100
5 __delaunayn__ 12.579 22.91 100
1 delaunayn 9.997 18.21 100
15 kron 2.593 4.72 300
22 logical 1.388 2.53 100
12 postfix .' 0.857 1.56 200
...
Laplace expansion:
>> clear all
>> x = rand(100,1)*4-2;y = rand(100,1)*4-2; z = rand(100,1)*4-2; p =
rand(100,1)*4-2;q = rand(100,1)*4-2; r = rand(100,1)*4-2;
>> profile clear; profile on; for idx = 1:100,
delaunayn([x,y,z,p,q,r]);endfor, profile off; profshow
# Function Attr Time (s) Time (%) Calls
----------------------------------------------------------------
5 __delaunayn__ 14.085 57.97 100
22 binary .* 1.971 8.11 87600
17 cross 1.690 6.95 12000
15 delaunayn>detvec4 1.341 5.52 3000
12 binary - 1.335 5.49 44100
14 delaunayn>detvec5 0.910 3.75 600
24 dot 0.869 3.57 12000
...
not much improvement.
upping to 1000pts:
3D, 1000pts
LU:
# Function Attr Time (s) Time (%) Calls
------------------------------------------------------------
23 lu 28.536 55.89 1000
5 __delaunayn__ 17.235 33.76 1000
1 delaunayn 2.665 5.22 1000
15 kron 0.811 1.59 3000
22 logical 0.674 1.32 1000
Laplace Exp:
# Function Attr Time (s) Time (%) Calls
----------------------------------------------------------------
5 __delaunayn__ 17.201 94.61 1000
1 delaunayn 0.443 2.43 1000
14 cross 0.143 0.79 1000
12 binary - 0.081 0.44 6000
6D, 1000pts:
LU decomp:
# Function Attr Time (s) Time (%) Calls
------------------------------------------------------------
23 lu 97.561 40.34 10
5 __delaunayn__ 85.050 35.17 10
1 delaunayn 35.671 14.75 10
15 kron 9.233 3.82 30
22 logical 5.001 2.07 10
12 postfix .' 3.478 1.44 20
(peak mem about 3GB, total time ~241s)
Laplace expansion:
# Function Attr Time (s) Time (%) Calls
----------------------------------------------------------------
5 __delaunayn__ 85.856 35.19 10
22 binary .* 55.332 22.68 8820
12 binary - 30.297 12.42 4410
15 delaunayn>detvec4 20.128 8.25 300
23 cat 10.416 4.27 1200
24 dot 10.393 4.26 1200
(peak mem about 700MB, total time ~244s)
so here at the cost of 3x the memory, LU decomp breaks even at 6D.
upping to 10000 points.
3D, 10kpts:
LU:
# Function Attr Time (s) Time (%) Calls
------------------------------------------------------------
23 lu 41.780 47.47 100
5 __delaunayn__ 27.882 31.68 100
1 delaunayn 10.659 12.11 100
15 kron 2.891 3.29 300
22 logical 1.455 1.65 100
Laplace expansion:
# Function Attr Time (s) Time (%) Calls
------------------------------------------------------------
5 __delaunayn__ 27.508 91.32 100
1 delaunayn 1.558 5.17 100
12 binary - 0.665 2.21 600
19 cat 0.099 0.33 100
22 sqrt 0.076 0.25 300
18 binary .* 0.073 0.24 800
6D:
Laplace expansion:
# Function Attr Time (s) Time (%) Calls
----------------------------------------------------------------
5 __delaunayn__ 195.219 44.54 1
22 binary .* 87.128 19.88 876
12 binary - 46.551 10.62 441
15 delaunayn>detvec4 29.981 6.84 30
24 dot 16.814 3.84 120
peak memory usage by Octave in the first few minutes hit about 3GB, then 7GB
toward the end
LU:
peak memory usage by Octave in the first few minutes hit about 3GB, then 13GB
toward the end, then
hit an out of memory limit.
incomplete profshow:
>> profshow
# Function Attr Time (s) Time (%) Calls
------------------------------------------------------------
5 __delaunayn__ 198.244 54.59 1
1 delaunayn 132.159 36.39 1
15 kron 13.536 3.73 2
22 logical 8.555 2.36 1
12 postfix .' 5.262 1.45 2
13 binary - 3.891 1.07 5
16 sparse 1.474 0.41 2
23 profile 0.011 0.00 1
24 binary != 0.002 0.00 1
I think it crashed out during the LU decomp, which is just showing up as
delaunayn time. Not sure how well that time can be trusted, since the times I
was hitting 97+ memory I saw the disk usage spike as it started trying to
diskswap.
So in summary, seems like with Octave's LU it starts to be faster at higher
dimensions for larger arrays, but at a big memory cost.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?60818>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, (continued)
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Markus Mützel, 2021/06/29
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Nicholas Jankowski, 2021/06/29
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Nicholas Jankowski, 2021/06/29
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Nicholas Jankowski, 2021/06/29
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, anonymous, 2021/06/29
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, anonymous, 2021/06/29
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Nicholas Jankowski, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Dmitri A. Sergatskov, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, anonymous, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Dmitri A. Sergatskov, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm,
Nicholas Jankowski <=
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Nicholas Jankowski, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Nicholas Jankowski, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Dmitri A. Sergatskov, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, anonymous, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Nicholas Jankowski, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, anonymous, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Nicholas Jankowski, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Dmitri A. Sergatskov, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, Nicholas Jankowski, 2021/06/30
- [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm, anonymous, 2021/06/30