emacs-elpa-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[elpa] externals/llm 4c0ff74251: Add provider llama.cpp


From: ELPA Syncer
Subject: [elpa] externals/llm 4c0ff74251: Add provider llama.cpp
Date: Fri, 10 Nov 2023 00:58:11 -0500 (EST)

branch: externals/llm
commit 4c0ff742512b5e78a9523489e2e4b94c1308295f
Author: Andrew Hyatt <ahyatt@gmail.com>
Commit: Andrew Hyatt <ahyatt@gmail.com>

    Add provider llama.cpp
    
    This fixes https://github.com/ahyatt/llm/issues/8.
---
 NEWS.org        |   2 +
 README.org      |  10 +++
 llm-llamacpp.el | 200 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 212 insertions(+)

diff --git a/NEWS.org b/NEWS.org
index 29dc63cbbc..ab2eaa07b8 100644
--- a/NEWS.org
+++ b/NEWS.org
@@ -1,3 +1,5 @@
+* Version 0.6
+- Add provider =llm-llamacpp=.
 * Version 0.5.2
 - Fix incompatibility with older Emacs introduced in Version 0.5.1.
 - Add support for Google Cloud Vertex model =text-bison= and variants.
diff --git a/README.org b/README.org
index b1b9536e76..d0d5accc57 100644
--- a/README.org
+++ b/README.org
@@ -35,6 +35,7 @@ In addition to the provider, which you may want multiple of 
(for example, to cha
 - ~llm-vertex-gcloud-region~: The gcloud region to use.  It's good to set this 
to a region near where you are for best latency.  Defaults to "us-central1".
 ** Ollama
 [[https://ollama.ai/][Ollama]] is a way to run large language models locally. 
There are [[https://ollama.ai/library][many different models]] you can use with 
it. You set it up with the following parameters:
+- ~:scheme~: The scheme (http/https) for the connection to ollama.  This 
default to "http".
 - ~:host~: The host that ollama is run on.  This is optional and will default 
to localhost.
 - ~:port~: The port that ollama is run on.  This is optional and will default 
to the default ollama port.
 - ~:chat-model~: The model name to use for chat.  This is not optional for 
chat use, since there is no default.
@@ -44,6 +45,15 @@ In addition to the provider, which you may want multiple of 
(for example, to cha
 - ~:host~: The host that GPT4All is run on.  This is optional and will default 
to localhost.
 - ~:port~: The port that GPT4All is run on.  This is optional and will default 
to the default ollama port.
 - ~:chat-model~: The model name to use for chat.  This is not optional for 
chat use, since there is no default.
+** llama.cpp
+[[https://github.com/ggerganov/llama.cpp][llama.cpp]] is a way to run large 
language models locally.  To use it with the =llm= package, you need to start 
the server (with the "--embedding" flag if you plan on using embeddings).  The 
server must be started with a model, so it is not possible to switch models 
until the server is restarted to use the new model.  As such, model is not a 
parameter to the provider, since the model choice is already set once the 
server starts.
+
+Llama.cpp does not have native chat interfaces, so is not as good at 
multi-round conversations as other solutions such as Ollama.  It will perform 
better at single-responses.
+
+The parameters default to optional values, so mostly users should just be 
creating a model with ~(make-llm-llamacpp)~.  The parameters are:
+- ~:scheme~: The scheme (http/https) for the connection to ollama.  This 
default to "http".
+- ~:host~: The host that llama.cpp server is run on.  This is optional and 
will default to localhost.
+- ~:port~: The port that llama.cpp server is run on.  This is optional and 
will default to 8080, the default llama.cpp port.
 ** Fake
 This is a client that makes no call, but it just there for testing and 
debugging.  Mostly this is of use to programmatic clients of the llm package, 
but end users can also use it to understand what will be sent to the LLMs.  It 
has the following parameters:
 - ~:output-to-buffer~: if non-nil, the buffer or buffer name to append the 
request sent to the LLM to.
diff --git a/llm-llamacpp.el b/llm-llamacpp.el
new file mode 100644
index 0000000000..944eca96ed
--- /dev/null
+++ b/llm-llamacpp.el
@@ -0,0 +1,200 @@
+;;; llm-llamacpp.el --- llm module for integrating with llama.cpp. -*- 
lexical-binding: t -*-
+
+;; Copyright (c) 2023  Free Software Foundation, Inc.
+
+;; Author: Andrew Hyatt <ahyatt@gmail.com>
+;; Homepage: https://github.com/ahyatt/llm
+;; SPDX-License-Identifier: GPL-3.0-or-later
+;;
+;; This program is free software; you can redistribute it and/or
+;; modify it under the terms of the GNU General Public License as
+;; published by the Free Software Foundation; either version 3 of the
+;; License, or (at your option) any later version.
+;;
+;; This program is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.
+
+;;; Commentary:
+;; This file implements the llm functionality defined in llm.el, for llama.cpp,
+;; which can be found at https://github.com/ggerganov/llama.cpp.
+
+;;; Code:
+
+(require 'llm)
+(require 'cl-lib)
+(require 'llm-request)
+(require 'json)
+
+(defgroup llm-llamacpp nil
+  "LLM implementation for llama.cpp."
+  :group 'llm)
+
+(defcustom llm-llamacpp-example-prelude "Example of how you should response 
follow."
+  "The prelude to use for examples."
+  :type 'string
+  :group 'llm-llamacpp)
+
+(defcustom llm-llamacpp-history-prelude "You are in the middle of a 
conversation between you and a user.  First, we will give you the previous 
conversation between you ('assistant') and the user, so you have the context, 
and then give you the latest message for you to response to.  The previous 
conversation follows."
+  "The prelude to use when there has been more than one interaction already.
+This is needed because there is no API support for previous chat conversation."
+  :type 'string)
+
+(cl-defstruct llm-llamacpp
+  "A struct representing a llama.cpp instance."
+  (scheme "http") (host "localhost") (port 8080))
+
+(defun llm-llamacpp-url (provider path)
+  "From PROVIDER, return the URL for llama.cpp.
+PATH is the path to append to the URL, not prefixed with a slash."
+  (let ((scheme (llm-llamacpp-scheme provider))
+        (host (llm-llamacpp-host provider))
+        (port (llm-llamacpp-port provider)))
+    (format "%s://%s:%d/%s" scheme host port path)))
+
+(defun llm-llamacpp-get-embedding-from-response (response)
+  "From JSON RESPONSE, return the embedding."
+  (let ((embedding (assoc-default 'embedding response)))
+    (when (and (= 0 (aref embedding 0)) (= 0 (aref embedding 1)))
+      (error "llm-llamacpp: embedding might be all 0s, make sure you are 
starting the server with the --embedding flag"))
+    embedding))
+
+(cl-defmethod llm-embedding ((provider llm-llamacpp) string)
+  (llm-llamacpp-get-embedding-from-response
+   (llm-request-sync (llm-llamacpp-url provider "embedding")
+                     :data `((content . ,string)))))
+
+(cl-defmethod llm-embedding-async ((provider llm-llamacpp) string 
vector-callback error-callback)
+  (let ((buf (current-buffer)))
+    (llm-request-async (llm-llamacpp-url provider "embedding")
+                       :data `((content . ,string))
+                       :on-success (lambda (data)
+                                   (llm-request-callback-in-buffer
+                                    buf vector-callback 
(llm-llamacpp-get-embedding-from-response data)))
+                       :on-error (lambda (_ _)
+                                   (llm-request-callback-in-buffer
+                                    buf error-callback 'error "Unknown error 
calling llm-llamacpp")))))
+
+(defun llm-llamacpp--prompt-to-text (prompt)
+  "From PROMPT, return the text to send to llama.cpp."
+  (concat
+   (when (llm-chat-prompt-context prompt)
+     (concat (llm-chat-prompt-context prompt) "\n"))
+   (when (llm-chat-prompt-examples prompt)
+     (concat llm-llamacpp-example-prelude "\n\n"
+             (mapconcat (lambda (example)
+                          (format "User: %s\nAssistant: %s" (car example) (cdr 
example)))
+                        (llm-chat-prompt-examples prompt) "\n") "\n\n"))
+   (when (> (length (llm-chat-prompt-interactions prompt)) 1)
+     (concat llm-llamacpp-history-prelude "\n\n"
+             (mapconcat (lambda (interaction)
+                          (format "%s: %s" (pcase 
(llm-chat-prompt-interaction-role interaction)
+                                             ('user "User")
+                                             ('assistant "Assistant"))
+                                  (llm-chat-prompt-interaction-content 
interaction)))
+                        (butlast (llm-chat-prompt-interactions prompt)) "\n")
+             "\n\nThe current conversation follows:\n\n"))
+   (llm-chat-prompt-interaction-content (car (last 
(llm-chat-prompt-interactions prompt))))))
+
+(defun llm-llamacpp--chat-request (prompt)
+  "From PROMPT, create the chat request data to send."
+  (append
+   `((prompt . ,(llm-llamacpp--prompt-to-text prompt)))
+   (when (llm-chat-prompt-max-tokens prompt)
+     `((max_tokens . ,(llm-chat-prompt-max-tokens prompt))))
+   (when (llm-chat-prompt-temperature prompt)
+     `((temperature . ,(llm-chat-prompt-temperature prompt))))))
+
+(cl-defmethod llm-chat ((provider llm-llamacpp) prompt)
+  (let ((output (assoc-default
+                 'content
+                 (llm-request-sync (llm-llamacpp-url provider "completion")
+                                   :data (llm-llamacpp--chat-request 
prompt)))))
+    (setf (llm-chat-prompt-interactions prompt)
+          (append (llm-chat-prompt-interactions prompt)
+                  (list (make-llm-chat-prompt-interaction
+                         :role 'assistant
+                         :content output))))
+    output))
+
+(cl-defmethod llm-chat-async ((provider llm-llamacpp) prompt response-callback 
error-callback)
+  (let ((buf (current-buffer)))
+    (llm-request-async (llm-llamacpp-url provider "completion")
+                       :data (llm-llamacpp--chat-request prompt)
+                       :on-success (lambda (data)
+                                     (let ((response (assoc-default 'content 
data)))
+                                       (setf (llm-chat-prompt-interactions 
prompt)
+                                             (append 
(llm-chat-prompt-interactions prompt)
+                                                     (list 
(make-llm-chat-prompt-interaction
+                                                            :role 'assistant
+                                                            :content 
response))))
+                                       (llm-request-callback-in-buffer
+                                        buf response-callback response)))
+                       :on-error (lambda (_ _)
+                                   (llm-request-callback-in-buffer
+                                    buf error-callback 'error "Unknown error 
calling llm-llamacpp")))))
+
+(defvar-local llm-llamacpp-current-response ""
+  "The response so far from the server.")
+
+(defvar-local llm-llamacpp-last-response 0
+  "The number of the last streaming response we read.
+The responses from OpenAI are not numbered, but we just number
+them from 1 to however many are sent.")
+
+(defun llm-llamacpp--get-partial-chat-response (response)
+  "From raw streaming output RESPONSE, return the partial chat response."
+  (let ((current-response llm-llamacpp-current-response)
+        (last-response llm-llamacpp-last-response))
+    (with-temp-buffer
+      (insert response)
+      (let* ((end-of-chunk-rx (rx (seq "\"stop\":" (0+ space) "false}")))
+             (end-pos (save-excursion (goto-char (point-max))
+                                      (when (search-backward-regexp
+                                             end-of-chunk-rx
+                                             nil t)
+                                        (pos-eol)))))
+        (when end-pos
+          (let ((all-lines (seq-filter
+                            (lambda (line) (string-match-p end-of-chunk-rx 
line))
+                            (split-string (buffer-substring-no-properties 1 
end-pos) "\n"))))
+            (setq current-response
+                  (concat current-response
+                          (mapconcat (lambda (line)
+                                       (assoc-default 'content
+                                                      (json-read-from-string
+                                                       
(replace-regexp-in-string "data: " "" line))))
+                                     (seq-subseq all-lines last-response) "")))
+            (setq last-response (length all-lines))))))
+    (when (> (length current-response) (length llm-llamacpp-current-response))
+        (setq llm-llamacpp-current-response current-response)
+        (setq llm-llamacpp-last-response last-response))
+    current-response))
+
+(cl-defmethod llm-chat-streaming ((provider llm-llamacpp) prompt 
partial-callback response-callback error-callback)
+  (let ((buf (current-buffer)))
+    (llm-request-async (llm-llamacpp-url provider "completion")
+                       :data (append (llm-llamacpp--chat-request prompt) 
'((stream . t)))
+                       :on-success-raw (lambda (data)
+                                     (let ((response 
(llm-llamacpp--get-partial-chat-response data)))
+                                       (setf (llm-chat-prompt-interactions 
prompt)
+                                             (append 
(llm-chat-prompt-interactions prompt)
+                                                     (list 
(make-llm-chat-prompt-interaction
+                                                            :role 'assistant
+                                                            :content 
response))))
+                                       (llm-request-callback-in-buffer
+                                        buf response-callback response)))
+                       :on-partial (lambda (data)
+                                     (when-let ((response 
(llm-llamacpp--get-partial-chat-response data)))
+                                       (llm-request-callback-in-buffer
+                                        buf partial-callback response)))
+                       :on-error (lambda (_ _)
+                                   (llm-request-callback-in-buffer
+                                    buf error-callback 'error "Unknown error 
calling llm-llamacpp")))))
+
+(provide 'llm-llamacpp)
+;;; llm-llamacpp.el ends here



reply via email to

[Prev in Thread] Current Thread [Next in Thread]