I saw the project that did this. It was satirical, and I think the point was to show how absurd it would be to maintain everything yourself, even with AI
No, absolutely not. It is safe to assume that most/all open source (and otherwise) has been part of the training data. You need not look further than the fact that some models can recite Harry Potter from memory. There is no such thing as “clean room” for AI.
This really isn’t true though, even if it is currently true in many cases. Case in point, if I wrote something and published it right now, it wouldn’t be part of any AI model yet. A party with a lot of money (like, say, a tech corporation) could easily create a bespoke coding model that is trained on everything but the desired libraries, thus achieving “clean room”.
In theory: Yes, future works are not yet part of the training data.
In practice: It takes months or years for an open source project (or any new technology) to take off and be considered valuable.
The other argument relies on said tech organization doing the right thing, and spending resources on training their own model (years and 100+ million) instead of including the cost of the lawsuit and pending fine in their cost/benefit analysis. I’m not aware that any such tech organization (with the means) exists.
Well last I heard you can’t copyright the output of an LLM, so the entire concept of a licence for open slopware is moot.
Unfortunately the “with significant human input” case hasn’t been tried yet. As with most of these things the team that spends the most on lawyers wins the vast majority of the time, so corpos will get the case law.
I’m hoping that the “with significant human input” case turns out to be a massive own goal and basically breaks software copyright a few years down the line when anyone can re-implement any software.
Of course that’s when lobbying buys a law to override the case law. Sigh.
Yah but community centric GPL to no copyright is sort of the goal for the recent slop rewrites.
If there is no copyright on the slop output code based on GPL code that’s a win for the corps.
So you are agreeing using the LLM worked? Because that’s what the author wanted: generate a freely usable version that is no longer bound by copyright or the original license.
Whether you own the copyright to your derivative work is not the same question as whether you are infringing someone else’s copyright.
Yes, but what does that have to do with LLM output being not copyrightable?
I think the fact that the maintainer is intimately knowledgeable about the original codebase is enough for it to not be a clean room re-implementation, no? That’s what makes it “clean”




