[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #62495] [octave forge] (statistics) pdist 'cos
From: |
Nicholas Jankowski |
Subject: |
[Octave-bug-tracker] [bug #62495] [octave forge] (statistics) pdist 'cosine' metric - internal expansion causes out of memory error |
Date: |
Fri, 20 May 2022 15:46:52 -0400 (EDT) |
URL:
<https://savannah.gnu.org/bugs/?62495>
Summary: [octave forge] (statistics) pdist 'cosine' metric -
internal expansion causes out of memory error
Project: GNU Octave
Submitted by: nrjank
Submitted on: Fri 20 May 2022 03:46:50 PM EDT
Category: Octave Forge Package
Severity: 3 - Normal
Priority: 5 - Normal
Item Group: Unexpected Error or Warning
Status: Confirmed
Assigned to: None
Originator Name: Nicholas Jankowski
Originator Email:
Open/Closed: Open
Release: other
Discussion Lock: Any
Operating System: Any
_______________________________________________________
Follow-up Comments:
-------------------------------------------------------
Date: Fri 20 May 2022 03:46:50 PM EDT By: Nicholas Jankowski <nrjank>
following a query over at stackoverflow [1], an attempt to use the
'silhouette' function was resulting in an unexpected "out of memory or
dimension too large for Octave's index type" error for inputs that are well
within expected memory/index length limits. Examlpe code below.
It turns out 'pdist' is called with the 'cosine' metric, and the vectorization
used in that method causes an extreme expansion in an internal variable,
causing the error. The test case input is 864x25333, the expected output is
864x1, but internally it attempts to create a 25333x372816 array.
test code:
pkg load statistics
data = rand(864,25333);
idx = kmeans(data,3,'Distance','cosine');
test1 = silhouette(data, idx, 'cosine');
error: out of memory or dimension too large for Octave's index type
error: called from
pdist at line 164 column 14
silhouette at line 125 column 16
pdist, lines 163-166 'cosine' block:
```
case "cosine"
prod = X(:,Xi) .* X(:,Yi);
weights = sumsq (X(:,Xi), 1) .* sumsq (X(:,Yi), 1);
y = 1 - sum (prod, 1) ./ sqrt (weights);
```
Xi and Yi are calculated from nchoosek(data, 2), resulting in a 2x372816
array. Thus X(:,Xi) and X(:,Yi) are each ~75GB if type double).
Testing against Matlab 2022a, the test code runs without issue in a few
seconds, and memory use never spikes more than 500MB over base usage. So such
an expansion seems to not be absolutely necessary for the algorithm. Would be
worth determining if a more memory efficient option is available.
[1]
https://stackoverflow.com/questions/72282190/octave-error-out-of-memory-or-dimension-too-large-for-octaves-index-type
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?62495>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #62495] [octave forge] (statistics) pdist 'cosine' metric - internal expansion causes out of memory error,
Nicholas Jankowski <=