[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[elpa] externals/elisa a22b96e601 2/7: Speed up semantic splitting using
From: |
ELPA Syncer |
Subject: |
[elpa] externals/elisa a22b96e601 2/7: Speed up semantic splitting using batch embeddings |
Date: |
Sat, 23 Nov 2024 12:57:56 -0500 (EST) |
branch: externals/elisa
commit a22b96e601245d5e1c98c854d961e37e9d7582fc
Author: Sergey Kostyaev <kostyaev.sergey2@wb.ru>
Commit: Sergey Kostyaev <kostyaev.sergey2@wb.ru>
Speed up semantic splitting using batch embeddings
---
elisa.el | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/elisa.el b/elisa.el
index 65ded645bf..c17c2eedd1 100644
--- a/elisa.el
+++ b/elisa.el
@@ -455,6 +455,14 @@ FOREIGN KEY(collection_id) REFERENCES collections(rowid)
"Calculate breakpoint threshold for DISTANCES based on K standard
deviations."
(+ (elisa-avg distances) (* k (elisa-std-dev distances))))
+(defun elisa-string-empty-p (s)
+ "Check if string S contain only spacing."
+ (length= (string-trim s) 0))
+
+(defun elisa-filter-strings (chunks)
+ "Filter out empty CHUNKS."
+ (cl-remove-if #'elisa-string-empty-p chunks))
+
(defun elisa-embeddings (chunks)
"Calculate embeddings for CHUNKS.
Return list of vectors."
@@ -681,13 +689,8 @@ ARGS contains keys for fine control.
than T, it will be packed into single semantic chunk."
(if-let* ((func (or (plist-get args :function)
elisa-semantic-split-function))
(k (or (plist-get args :threshold-amount)
elisa-breakpoint-threshold-amount))
- (chunks (funcall func))
- (embeddings (cl-remove-if
- #'not
- (mapcar (lambda (s)
- (when (length> (string-trim s) 0)
- (llm-embedding elisa-embeddings-provider
s)))
- chunks)))
+ (chunks (elisa-filter-strings (funcall func)))
+ (embeddings (elisa-embeddings chunks))
(distances (elisa--distances embeddings))
(threshold (elisa-calculate-threshold k distances))
(current (car chunks))
- [elpa] externals/elisa updated (4a4d1db359 -> 1c1e0f1715), ELPA Syncer, 2024/11/23
- [elpa] externals/elisa 5d1c1c0f0b 3/7: Fix elisp-check errors, ELPA Syncer, 2024/11/23
- [elpa] externals/elisa 1c1e0f1715 7/7: Bump version, ELPA Syncer, 2024/11/23
- [elpa] externals/elisa 9273c84961 4/7: Update required llm dependency, ELPA Syncer, 2024/11/23
- [elpa] externals/elisa 24b0ba8f2e 6/7: Merge pull request #28 from s-kostyaev/add-batch-embeddings-calculation, ELPA Syncer, 2024/11/23
- [elpa] externals/elisa a4f34972bb 1/7: Improve embeddings calculation in Elisa, ELPA Syncer, 2024/11/23
- [elpa] externals/elisa a22b96e601 2/7: Speed up semantic splitting using batch embeddings,
ELPA Syncer <=
- [elpa] externals/elisa 860936af49 5/7: Disable batch embeddings by default, ELPA Syncer, 2024/11/23