[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LLM Experiments, Part 1: Corrections
|
From: |
Andrew Hyatt |
|
Subject: |
Re: LLM Experiments, Part 1: Corrections |
|
Date: |
Mon, 22 Jan 2024 16:31:58 -0400 |
|
User-agent: |
Gnus/5.13 (Gnus v5.13) |
On 23 January 2024 01:50, Sergey Kostyaev <sskostyaev@gmail.com>
wrote:
Hello everyone, This is cool Idea, I will definitely use it in
ellama. But I have some suggestions: 1. Every prompt should be
customizable. And since llm is low level library, it should be
customizable at function call time (to manage custom variables
on caller’s side). Or easier will be to reimplement this
functionality. 2. Maybe it will be useful to make corrections
other way (not instead of current solution, but together with
it): user press some keybinding and change prompt or other
parameters and redo query. Follow up revision also useful, so
don’t remove it. About your questions: 1. I think it should
be different require calls, but what package it will be -
doesn’t matter for me. Do it anyhow you will be comfortable
with. 2. I don’t know fsm library, but understand how to
manage finite state machines. I would prefer simpler code. If
it will be readable with this library - ok, if without - also
fine.
Agree to all the above. Seems worth trying out fsm, but not sure
how much it will help.
3. This should have small default length (256 - 1000 tokens or
words or something like that) and be extendable by caller’s
code. This should be different in different scenarios. Need
maximum flexibility here.
Agreed, probably a small default length is sufficient - but it
might be good to have options for maximizing the length. The
extensibility here may be tricky to design, but it's important.
4. 20 seconds of blocked Emacs is way too long. Some big local
models are very good, but not very fast. For example mixtral
8x7b instruct provides great quality, but not very fast. I
prefer not break user’s flow by blocking. I think configurable
ability to show generation stream (or don’t show if user don’t
want it) will be perfect.
How do you see this working in the demo I shared, though?
Streaming wouldn't help at all, AFAICT. If you don't block, how
does the user get to the ediff screen? Does it just pop up in the
middle of whatever they were doing? That seems intrusive. Better
would be to message to the user that they can do something to get
back into the workflow. Still, at least for me, I'd prefer to just
wait. I'm doing something that I'm turning my attention to, so
even if it takes a while, I want to maintain my focus on that
task. At least as long as I don't get bored, but LLMs are fast
enough that I'm not losing focus here.
5. See https://github.com/karthink/gptel as an example of
flexibility.
Agreed, it's a very full system for prompt editing.
6. Emacs has great explainability. There are ‘M-x’ commands, which-key
integration for faster remembering keybindings. And we can
add other interface (for example, grouping actions by meaning with
completing-read, for example).
Best regards,
Sergey Kostyaev
22 янв. 2024 г., в 11:15, Andrew Hyatt <ahyatt@gmail.com> написал(а):
Hi everyone,
This email is a demo and a summary of some questions which could use your feedback in the context of using LLMs in Emacs, and
specifically the development of the llm GNU ELPA package. If that interests
you, read on.
I'm starting to experiment with what LLMs and Emacs, together, are capable of. I've written the llm package to act as a base layer,
allowing communication various LLMs: servers, local LLMs, free, and
nonfree. ellama, also a GNU ELPA package, is also
showing some interesting functionality - asking about a region, translating
a region, adding code, getting a code review, etc.
My goal is to take that basic approach that ellama is doing (providing useful functionality beyond chat that only the LLM can give),
and expand it to a new set of more complicated interactions. Each new
interaction is a new demo, and as I write them, I'll continue
to develop a library that can support these more complicated experiences.
The demos should be interesting, and more importantly,
developing them brings up interesting questions that this mailing list may
have some opinions on.
To start, I have a demo of showing the user using an LLM to rewrite existing text.
<rewrite-demo.gif>
I've created a function that will ask for a rewrite of the current region.
The LLM offers a suggestion, which the user can review with
ediff, and ask for a revision. This can continue until the user is
satisfied, and then the user can accept the rewrite, which will replace
the region.
You can see the version of code in a branch of my llm source here:
https://raw.githubusercontent.com/ahyatt/llm/flows/llm-flows.el
And you can see the code that uses it to write the text corrector function here:
https://gist.githubusercontent.com/ahyatt/63d0302c007223eaf478b84e64bfd2cc/raw/c1b89d001fcbe948cf563d5ee2eeff00976175d4/llm-flows-example.el
There's a few questions I'm trying to figure out in all these demos, so let me state them and give my current guesses. These are
things I'd love feedback on.
Question 1: Does the llm-flows.el file really belong in the llm package? It does help people code against llms, but it expands the
scope of the llm package from being just about connecting to different LLMs
to offering a higher level layer necessary for these more
complicated flows. I think this probably does make sense, there's no need
to have a separate package just for this one part.
Question 2: What's the best way to write these flows with multiple stages, in which some stages sometimes need to be repeated? It's
kind of a state machine when you think about it, and there's a state
machine GNU ELPA library already (fsm). I opted to not model
it explicitly as a state machine, optimizing instead to just use the most
straightforward code possible.
Question 3: How should we deal with context? The code that has the text corrector doesn't include surrounding context (the text
before and after the text to rewrite), but it usually is helpful. How much
context should we add? The llm package does know about
model token limits, but more tokens add more cost in terms of actual money
(per/token billing for services, or just the CPU energy
costs for local models). Having it be customizable makes sense to some
extent, but users are not expected to have a good sense of
how much context to include. My guess is that we should just have a small
amount of context that won't be a problem for most
models. But there's other questions as well when you think about context
generally: How would context work in different modes?
What about when context may spread in multiple files? It's a problem that I
don't have any good insight into yet.
Question 4: Should the LLM calls be synchronous? In general, it's not great to block all of Emacs on a sync call to the LLM. On the
other hand, the LLM calls are generally fast enough (a few seconds, the
current timeout is 20s) that the user isn't going to be
accomplishing much while the LLM works, and is likely to get into a state
where the workflow is waiting for their input and we
have to get them back to a state where they are interacting with the
workflow. Streaming calls are a way that works well for just
getting a response from the LLM, but when we have a workflow, the response
isn't useful until it is processed (in the demo's case,
until it is an input into ediff-buffers). I think things have to be
synchronous here.
Question 5: Should there be a standard set of user behaviors about editing the prompt? In another demo (one I'll send as a
followup), with a universal argument, the user can edit the prompt, minus
context and content (in this case the content is the text to
correct). Maybe that should always be the case. However, that prompt can be
long, perhaps a bit long for the minibuffer. Using a
buffer instead seems like it would complicate the flow. Also, if the
context and content is embedded in that prompt, they would have
to be replaced with some placeholder. I think the prompt should always be
editable, we should have some templating system.
Perhaps emacs already has some templating system, and one that can pass
arguments for number of tokens from context would be
nice.
Question 6: How do we avoid having a ton of very specific functions for all the various ways that LLMs can be used? Besides
correcting text, I could have had it expand it, summarize it, translate it,
etc. Ellama offers all these things (but without the diff and
other workflow-y aspects). I think these are too much for the user to
remember. It'd be nice to have one function when the user wants
to do something, and we work out what to do in the workflow. But the user
shouldn't be developing the prompt themselves; at least
at this point, it's kind of hard to just think of everything you need to
think of in a good prompt. They need to be developed, updated,
etc. What might be good is a system in which the user chooses what they
want to do to a region as a secondary input, kind of like
another kind of execute-extended-command.
These are the issues as I see them now. As I continue to develop demos, and as people in the list give feedback, I'll try to work
through them.
BTW, I plan on continuing these emails, one for every demo, until the questions seem worked out. If this mailing list is not the
appropriate place for this, let me know.
- LLM Experiments, Part 1: Corrections, Andrew Hyatt, 2024/01/21
- Re: LLM Experiments, Part 1: Corrections, Sergey Kostyaev, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections,
Andrew Hyatt <=
- Re: LLM Experiments, Part 1: Corrections, T.V Raman, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, Andrew Hyatt, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, T.V Raman, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, Emanuel Berg, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, Andrew Hyatt, 2024/01/22
Re: LLM Experiments, Part 1: Corrections, João Távora, 2024/01/22
Re: LLM Experiments, Part 1: Corrections, Karthik Chikmagalur, 2024/01/23
Re: LLM Experiments, Part 1: Corrections, contact, 2024/01/23