|
From: | Ian Eure |
Subject: | Re: Concerns/questions around Software Heritage Archive |
Date: | Sat, 20 Apr 2024 11:48:20 -0700 |
User-agent: | mu4e 1.12.2; emacs 29.3 |
Hello,I’m following up on this since discussion since it’s been a month and I haven’t heard any updates.
Summarizing the situation:- SHF has an opaque, difficult, and undocumented process for handling name changes. I’s like to stress again that this is *not* strictly a transgender issue (though it likely affects them more, or in worse/different ways) -- it is a human respect issue. Many, many more cisgender people change their name than transgender people.
- SHF gave their archive to HuggingFace, an "AI" company which is generating derived works with no attribution or provenance, in ways which violate the both licenses of the projects used to train their model, and the SHF principles for LLMs.
- HuggingFace wasn’t respecting requests to opt-out of their model.
On the first point, it sounds like SHF has made concrete progress to improve[1], which is very good to hear. If SHF continues on this course, I think the concern is resolved.
On the third point, HuggingFace has begun honoring opt-out requests, but is still very far behind. Also, they don’t remove code from the older versions of their model -- it remains there forever. This is progress, but still, not great.
On the second point, I have not seen any public statements indicating that either SHF or HuggingFace even acknowledges the problem. SHF’s most recent newsletter[2], published in April 2024 (after these concerns came to light), continues to tout that StarCoder2 is "the first AI model aligned with our principles," which appears to be false. StarCoder2 includes both licensed and unlicensed code, and HuggingFace’s own StarChat2 playground produces works derivative of this code, with no attribution or licensing information. There is also no statement or position on the SHF news blog. Nor hsa HuggingFace either fixed their tools, or made a statement. This is still very much a live concern.
I have a few questions:- Has Guix reached out to SHF to express these concerns / get a response? - Whether a public or private response, what would Guix consider to be an acceptable response? An unacceptable respoinse?
- How long is Guix willing to wait for a response? Thanks, — Ian[1]: https://cohost.org/arborelia/post/5273879-they-are-fixing-some [2]: https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf
Ian Eure <ian@retrospec.tv> writes:
Hi Guixy people,I’d never heard of SWH before I started hacking on Guix last fall, and it struck me as rather a good idea. However, I’ve seen some thingslately which have soured me on them. They appear to be using the archive to build LLMs: https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/I was also distressed to see how poorly they treated a developer whowished to update their name: https://cohost.org/arborelia/post/4968198-the-software-heritag https://cohost.org/arborelia/post/5052044-the-software-heritagGPL’d software I’ve created has been packaged for Guix, which I assume means it’s been included in SWH. While I’m dealing with their (IMO: unethical) opt-out process, I likely also need to stop new copies frombeing uploaded again in the future.Is there a way to indicate, in a Guix package, that it should *never*be included in SWH? Is there a way to tell Guix to never download source from SWH? I want absolutely nothing to do with them. Thanks, — Ian
[Prev in Thread] | Current Thread | [Next in Thread] |