[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code ge
|
From: |
Daniel P . Berrangé |
|
Subject: |
Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators |
|
Date: |
Fri, 24 Nov 2023 11:41:20 +0000 |
|
User-agent: |
Mutt/2.2.10 (2023-03-25) |
On Fri, Nov 24, 2023 at 10:21:17AM +0000, Alex Bennée wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
>
> > On Thu, Nov 23, 2023 at 05:39:18PM -0500, Michael S. Tsirkin wrote:
> >> On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote:
> >> > The license of a code generation tool itself is usually considered
> >> > to be not a factor in the license of its output.
> >>
> >> Really? I would find it very surprising if a code generation tool that
> >> is not a language model and so is not understanding the code it's
> >> generating did not include some code snippets going into the output.
> >> It is also possible to unintentionally run afoul of GPL's definition of
> >> source
> >> code which is "the preferred form of the work for making modifications to
> >> it".
> >> So even if you have copyright to input, dumping just output and putting
> >> GPL on it might or might not be ok.
> >
> > Consider the C pre-processor. This takes an input .c file, and expands
> > all the macros, to split out a new .c file.
> >
> > The license of the output .c file is determined by the license of the
> > input .c file. The license of the CPP impl (whether OSS or proprietary)
> > doesn't have any influence on the license of the output file, it cannot
> > magically force the output file to be proprietary any more than it can
> > force it to be output file GPL.
>
> LLM's are just a tool like a compiler (albeit with spookier different
> internals). The prompt and the instructions are arguably the more
> important part of how to get good results from the LLM transformation.
> In fact most of the way I've been using them has been by pasting some
> existing code and asking for review or transformation of it.
>
> However I totally get that using the various online LLMs you have very
> little transparency about what has gone into their training and therefor
> there is a danger of proprietary code being hallucinated out of their
> matricies. Conversely what if I use an LLM like OpenLLaMa:
>
> https://github.com/openlm-research/open_llama
>
> I have fairly exhaustive definitions of what went into the training data
> which of most interest is probably the StarCoder dataset (paper):
>
> https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view
>
> where there are tools to detect if generated code has been lifted
> directly from the dataset or is indeed a transformation.
I've not looked at the links above, but I think if someone can make an
compelling argument that *specific* tools have sufficient transparency
to be compatible with signing the DCO, then I think we could maintain a
list of exceptions in the policy.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
- Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators, (continued)
- Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators, Daniel P . Berrangé, 2023/11/23
- Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators, Michael S. Tsirkin, 2023/11/23
- Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators, Daniel P . Berrangé, 2023/11/24
- Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators, Michael S. Tsirkin, 2023/11/24
- Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators, Alex Bennée, 2023/11/24
- Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators, Michael S. Tsirkin, 2023/11/24
- Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators,
Daniel P . Berrangé <=
Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators, Stefan Hajnoczi, 2023/11/23