[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[elpa] externals/llm 4c0ff74251: Add provider llama.cpp
From: |
ELPA Syncer |
Subject: |
[elpa] externals/llm 4c0ff74251: Add provider llama.cpp |
Date: |
Fri, 10 Nov 2023 00:58:11 -0500 (EST) |
branch: externals/llm
commit 4c0ff742512b5e78a9523489e2e4b94c1308295f
Author: Andrew Hyatt <ahyatt@gmail.com>
Commit: Andrew Hyatt <ahyatt@gmail.com>
Add provider llama.cpp
This fixes https://github.com/ahyatt/llm/issues/8.
---
NEWS.org | 2 +
README.org | 10 +++
llm-llamacpp.el | 200 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 212 insertions(+)
diff --git a/NEWS.org b/NEWS.org
index 29dc63cbbc..ab2eaa07b8 100644
--- a/NEWS.org
+++ b/NEWS.org
@@ -1,3 +1,5 @@
+* Version 0.6
+- Add provider =llm-llamacpp=.
* Version 0.5.2
- Fix incompatibility with older Emacs introduced in Version 0.5.1.
- Add support for Google Cloud Vertex model =text-bison= and variants.
diff --git a/README.org b/README.org
index b1b9536e76..d0d5accc57 100644
--- a/README.org
+++ b/README.org
@@ -35,6 +35,7 @@ In addition to the provider, which you may want multiple of
(for example, to cha
- ~llm-vertex-gcloud-region~: The gcloud region to use. It's good to set this
to a region near where you are for best latency. Defaults to "us-central1".
** Ollama
[[https://ollama.ai/][Ollama]] is a way to run large language models locally.
There are [[https://ollama.ai/library][many different models]] you can use with
it. You set it up with the following parameters:
+- ~:scheme~: The scheme (http/https) for the connection to ollama. This
default to "http".
- ~:host~: The host that ollama is run on. This is optional and will default
to localhost.
- ~:port~: The port that ollama is run on. This is optional and will default
to the default ollama port.
- ~:chat-model~: The model name to use for chat. This is not optional for
chat use, since there is no default.
@@ -44,6 +45,15 @@ In addition to the provider, which you may want multiple of
(for example, to cha
- ~:host~: The host that GPT4All is run on. This is optional and will default
to localhost.
- ~:port~: The port that GPT4All is run on. This is optional and will default
to the default ollama port.
- ~:chat-model~: The model name to use for chat. This is not optional for
chat use, since there is no default.
+** llama.cpp
+[[https://github.com/ggerganov/llama.cpp][llama.cpp]] is a way to run large
language models locally. To use it with the =llm= package, you need to start
the server (with the "--embedding" flag if you plan on using embeddings). The
server must be started with a model, so it is not possible to switch models
until the server is restarted to use the new model. As such, model is not a
parameter to the provider, since the model choice is already set once the
server starts.
+
+Llama.cpp does not have native chat interfaces, so is not as good at
multi-round conversations as other solutions such as Ollama. It will perform
better at single-responses.
+
+The parameters default to optional values, so mostly users should just be
creating a model with ~(make-llm-llamacpp)~. The parameters are:
+- ~:scheme~: The scheme (http/https) for the connection to ollama. This
default to "http".
+- ~:host~: The host that llama.cpp server is run on. This is optional and
will default to localhost.
+- ~:port~: The port that llama.cpp server is run on. This is optional and
will default to 8080, the default llama.cpp port.
** Fake
This is a client that makes no call, but it just there for testing and
debugging. Mostly this is of use to programmatic clients of the llm package,
but end users can also use it to understand what will be sent to the LLMs. It
has the following parameters:
- ~:output-to-buffer~: if non-nil, the buffer or buffer name to append the
request sent to the LLM to.
diff --git a/llm-llamacpp.el b/llm-llamacpp.el
new file mode 100644
index 0000000000..944eca96ed
--- /dev/null
+++ b/llm-llamacpp.el
@@ -0,0 +1,200 @@
+;;; llm-llamacpp.el --- llm module for integrating with llama.cpp. -*-
lexical-binding: t -*-
+
+;; Copyright (c) 2023 Free Software Foundation, Inc.
+
+;; Author: Andrew Hyatt <ahyatt@gmail.com>
+;; Homepage: https://github.com/ahyatt/llm
+;; SPDX-License-Identifier: GPL-3.0-or-later
+;;
+;; This program is free software; you can redistribute it and/or
+;; modify it under the terms of the GNU General Public License as
+;; published by the Free Software Foundation; either version 3 of the
+;; License, or (at your option) any later version.
+;;
+;; This program is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>.
+
+;;; Commentary:
+;; This file implements the llm functionality defined in llm.el, for llama.cpp,
+;; which can be found at https://github.com/ggerganov/llama.cpp.
+
+;;; Code:
+
+(require 'llm)
+(require 'cl-lib)
+(require 'llm-request)
+(require 'json)
+
+(defgroup llm-llamacpp nil
+ "LLM implementation for llama.cpp."
+ :group 'llm)
+
+(defcustom llm-llamacpp-example-prelude "Example of how you should response
follow."
+ "The prelude to use for examples."
+ :type 'string
+ :group 'llm-llamacpp)
+
+(defcustom llm-llamacpp-history-prelude "You are in the middle of a
conversation between you and a user. First, we will give you the previous
conversation between you ('assistant') and the user, so you have the context,
and then give you the latest message for you to response to. The previous
conversation follows."
+ "The prelude to use when there has been more than one interaction already.
+This is needed because there is no API support for previous chat conversation."
+ :type 'string)
+
+(cl-defstruct llm-llamacpp
+ "A struct representing a llama.cpp instance."
+ (scheme "http") (host "localhost") (port 8080))
+
+(defun llm-llamacpp-url (provider path)
+ "From PROVIDER, return the URL for llama.cpp.
+PATH is the path to append to the URL, not prefixed with a slash."
+ (let ((scheme (llm-llamacpp-scheme provider))
+ (host (llm-llamacpp-host provider))
+ (port (llm-llamacpp-port provider)))
+ (format "%s://%s:%d/%s" scheme host port path)))
+
+(defun llm-llamacpp-get-embedding-from-response (response)
+ "From JSON RESPONSE, return the embedding."
+ (let ((embedding (assoc-default 'embedding response)))
+ (when (and (= 0 (aref embedding 0)) (= 0 (aref embedding 1)))
+ (error "llm-llamacpp: embedding might be all 0s, make sure you are
starting the server with the --embedding flag"))
+ embedding))
+
+(cl-defmethod llm-embedding ((provider llm-llamacpp) string)
+ (llm-llamacpp-get-embedding-from-response
+ (llm-request-sync (llm-llamacpp-url provider "embedding")
+ :data `((content . ,string)))))
+
+(cl-defmethod llm-embedding-async ((provider llm-llamacpp) string
vector-callback error-callback)
+ (let ((buf (current-buffer)))
+ (llm-request-async (llm-llamacpp-url provider "embedding")
+ :data `((content . ,string))
+ :on-success (lambda (data)
+ (llm-request-callback-in-buffer
+ buf vector-callback
(llm-llamacpp-get-embedding-from-response data)))
+ :on-error (lambda (_ _)
+ (llm-request-callback-in-buffer
+ buf error-callback 'error "Unknown error
calling llm-llamacpp")))))
+
+(defun llm-llamacpp--prompt-to-text (prompt)
+ "From PROMPT, return the text to send to llama.cpp."
+ (concat
+ (when (llm-chat-prompt-context prompt)
+ (concat (llm-chat-prompt-context prompt) "\n"))
+ (when (llm-chat-prompt-examples prompt)
+ (concat llm-llamacpp-example-prelude "\n\n"
+ (mapconcat (lambda (example)
+ (format "User: %s\nAssistant: %s" (car example) (cdr
example)))
+ (llm-chat-prompt-examples prompt) "\n") "\n\n"))
+ (when (> (length (llm-chat-prompt-interactions prompt)) 1)
+ (concat llm-llamacpp-history-prelude "\n\n"
+ (mapconcat (lambda (interaction)
+ (format "%s: %s" (pcase
(llm-chat-prompt-interaction-role interaction)
+ ('user "User")
+ ('assistant "Assistant"))
+ (llm-chat-prompt-interaction-content
interaction)))
+ (butlast (llm-chat-prompt-interactions prompt)) "\n")
+ "\n\nThe current conversation follows:\n\n"))
+ (llm-chat-prompt-interaction-content (car (last
(llm-chat-prompt-interactions prompt))))))
+
+(defun llm-llamacpp--chat-request (prompt)
+ "From PROMPT, create the chat request data to send."
+ (append
+ `((prompt . ,(llm-llamacpp--prompt-to-text prompt)))
+ (when (llm-chat-prompt-max-tokens prompt)
+ `((max_tokens . ,(llm-chat-prompt-max-tokens prompt))))
+ (when (llm-chat-prompt-temperature prompt)
+ `((temperature . ,(llm-chat-prompt-temperature prompt))))))
+
+(cl-defmethod llm-chat ((provider llm-llamacpp) prompt)
+ (let ((output (assoc-default
+ 'content
+ (llm-request-sync (llm-llamacpp-url provider "completion")
+ :data (llm-llamacpp--chat-request
prompt)))))
+ (setf (llm-chat-prompt-interactions prompt)
+ (append (llm-chat-prompt-interactions prompt)
+ (list (make-llm-chat-prompt-interaction
+ :role 'assistant
+ :content output))))
+ output))
+
+(cl-defmethod llm-chat-async ((provider llm-llamacpp) prompt response-callback
error-callback)
+ (let ((buf (current-buffer)))
+ (llm-request-async (llm-llamacpp-url provider "completion")
+ :data (llm-llamacpp--chat-request prompt)
+ :on-success (lambda (data)
+ (let ((response (assoc-default 'content
data)))
+ (setf (llm-chat-prompt-interactions
prompt)
+ (append
(llm-chat-prompt-interactions prompt)
+ (list
(make-llm-chat-prompt-interaction
+ :role 'assistant
+ :content
response))))
+ (llm-request-callback-in-buffer
+ buf response-callback response)))
+ :on-error (lambda (_ _)
+ (llm-request-callback-in-buffer
+ buf error-callback 'error "Unknown error
calling llm-llamacpp")))))
+
+(defvar-local llm-llamacpp-current-response ""
+ "The response so far from the server.")
+
+(defvar-local llm-llamacpp-last-response 0
+ "The number of the last streaming response we read.
+The responses from OpenAI are not numbered, but we just number
+them from 1 to however many are sent.")
+
+(defun llm-llamacpp--get-partial-chat-response (response)
+ "From raw streaming output RESPONSE, return the partial chat response."
+ (let ((current-response llm-llamacpp-current-response)
+ (last-response llm-llamacpp-last-response))
+ (with-temp-buffer
+ (insert response)
+ (let* ((end-of-chunk-rx (rx (seq "\"stop\":" (0+ space) "false}")))
+ (end-pos (save-excursion (goto-char (point-max))
+ (when (search-backward-regexp
+ end-of-chunk-rx
+ nil t)
+ (pos-eol)))))
+ (when end-pos
+ (let ((all-lines (seq-filter
+ (lambda (line) (string-match-p end-of-chunk-rx
line))
+ (split-string (buffer-substring-no-properties 1
end-pos) "\n"))))
+ (setq current-response
+ (concat current-response
+ (mapconcat (lambda (line)
+ (assoc-default 'content
+ (json-read-from-string
+
(replace-regexp-in-string "data: " "" line))))
+ (seq-subseq all-lines last-response) "")))
+ (setq last-response (length all-lines))))))
+ (when (> (length current-response) (length llm-llamacpp-current-response))
+ (setq llm-llamacpp-current-response current-response)
+ (setq llm-llamacpp-last-response last-response))
+ current-response))
+
+(cl-defmethod llm-chat-streaming ((provider llm-llamacpp) prompt
partial-callback response-callback error-callback)
+ (let ((buf (current-buffer)))
+ (llm-request-async (llm-llamacpp-url provider "completion")
+ :data (append (llm-llamacpp--chat-request prompt)
'((stream . t)))
+ :on-success-raw (lambda (data)
+ (let ((response
(llm-llamacpp--get-partial-chat-response data)))
+ (setf (llm-chat-prompt-interactions
prompt)
+ (append
(llm-chat-prompt-interactions prompt)
+ (list
(make-llm-chat-prompt-interaction
+ :role 'assistant
+ :content
response))))
+ (llm-request-callback-in-buffer
+ buf response-callback response)))
+ :on-partial (lambda (data)
+ (when-let ((response
(llm-llamacpp--get-partial-chat-response data)))
+ (llm-request-callback-in-buffer
+ buf partial-callback response)))
+ :on-error (lambda (_ _)
+ (llm-request-callback-in-buffer
+ buf error-callback 'error "Unknown error
calling llm-llamacpp")))))
+
+(provide 'llm-llamacpp)
+;;; llm-llamacpp.el ends here
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [elpa] externals/llm 4c0ff74251: Add provider llama.cpp,
ELPA Syncer <=