If you ask AI to rewrite the entirety of an open-source program, do you still need to abide by the original license?
-
@lcamtuf @ArneBab @kevinr @bgalehouse
"Use" isn't part of the GPL. And "all rights reserved" means normal copyright law, not "you get no rights at all".
The GPL defines "modify" and "propagate" as the activities it burdens. If I modify the code, and propagate it, i have a legal burden under the license. Otherwise, I don't.
IANAL, but I don't think reading the code and re-implementing a work-alike without incorporating the original code is "modify" - it's "replace".
I understand that's where "clean rooms" come into play, but that always felt like splitting hairs and giving copyright too much power - it's about physical books, not ideas. The farther we move from the original intent, the weaker a strong copyright stance becomes.
I think you could make an argument that reading code to understand it's interfaces, explicitly rejecting accepting any license, then implementing compatible code is well within the normal copyright definition of "fair use", or should be if we aren't all copyright lawyers. More importantly, it's healthy for Society and the art. If I can read a book under copyright and write a detailed book report, I should be able to read provided source code and do the same. To the extent that we've strayed away from that, the legal system has failed and needs correction.
@tbortels yes, not accepting the license means regular copyrights.
But your arguments afterwards rely on rights the GPL gives you -- you only get them after you accept the license.
EDIT: because "if we aren’t allowed … under copyright" ← we aren’t. That’s the point.
As long as there’s no NDA (there isn’t for GPL), we *can* write a spec. But the one implementing it *must not* know the code.
-
I'm not sure "closed" is the right word. Clearly it's not closed if you are providing it - it's right there, I can read it and even redistribute it without burden.
It's "copyrighted", not closed. You can't modify closed source because you don't have the source. The assertion being made is you can't modify GPL'd open source without accepting the license. But copyright has its own carve-outs, and I am unconvinced that writing a spec or net-new code is a modification, as opposed to regular old copyright fair use.
@tbortels you cannot redistribute copyrighted material(?)
If you make a spec of copyrighted code that's effectively instructions on how to reproduce the code and can be used to commercially compete with the owners of the code so I doubt it could classify as fair use. -
@rustynail @ahltorp @tbortels @lcamtuf @bgalehouse @kevinr Hmm, there is another consequence to this.
If this is a derivative work, which I expect it is.
It causes issues when someone has, in fact, manually, coding an alternative to some copyright work (without reading original code, etc). As someone can suggest that it was done using AI as a derivative work. It no longer needs to actually follow the original code now to be accused of this.
Arrg!
@ahltorp @bgalehouse @revk @lcamtuf @kevinr @rustynail
AI is a weird case as you could assert - probably correctly - that the original code may be part of its training corpus. Was that training a GPL violation? It's a stretch. Was it's training a copyright violation? Or was the AI (or rather its owners) exercising their GPL license rights? Or was it fair use under regular copyright?
Who knows?
It's a hot mess is what it is.
This is all so far outside the original reckoning of "it'd be nice if the bookbinder down the street didn't profit off of my work until I had a chance to profit off of it first" that it's not surprising it's a mess.
-
@kevinr @bgalehouse @lcamtuf @ArneBab
It explicitly does not. If I don't accept the license, normal copyright applies. You don't get to make a legally binding contract without consent, "clickwrap" bullshit aside.
And normal copyright has carve-outs like fair use.
@tbortels if you start relying on fair use, you enter a gray zone: courts will take decisions on that.
You don’t want that as the basis of anything that provides income.
A lawsuit in a gray area can ruin you, even if you’re likely to win.
-
@tbortels yes, not accepting the license means regular copyrights.
But your arguments afterwards rely on rights the GPL gives you -- you only get them after you accept the license.
EDIT: because "if we aren’t allowed … under copyright" ← we aren’t. That’s the point.
As long as there’s no NDA (there isn’t for GPL), we *can* write a spec. But the one implementing it *must not* know the code.
@ArneBab @kevinr @lcamtuf @bgalehouse
Fair use isn't something the GPL grants you. That's what I'm trying to work out - set the GPL aside for a moment.
Does regular copyright fair use give me the right to look at the freely provided source code, make a mental model, and re-implement a workalike if I don't re-use the original source?
Pretend it's just me and not an AI, because that throws a whole new set of confusion into the mix.
BSD did it against regular copyright. Not sure this is all that different.
-
@tbortels you cannot redistribute copyrighted material(?)
If you make a spec of copyrighted code that's effectively instructions on how to reproduce the code and can be used to commercially compete with the owners of the code so I doubt it could classify as fair use.It's about how to reproduce the functionality - the code could be an entirely different language.
And - "commercially compete" with someone giving away code for free seems a non-concern.
-
@tbortels if you start relying on fair use, you enter a gray zone: courts will take decisions on that.
You don’t want that as the basis of anything that provides income.
A lawsuit in a gray area can ruin you, even if you’re likely to win.
@ArneBab @lcamtuf @kevinr @bgalehouse
We entered a gray zone about 8 off-ramps ago. Copyright never anticipated self-replicating code on computers and viral licenses and clean-room re-implementations and AIs.
As for income - I've lost track of the original driver, but it's GPL'd free code, no?
I like fair use. It and parody are one of the very few things keeping us out of peasants-with-pitchforks-and-torches mode. If you eliminate those carve-outs, the whole system goes down.
-
It's about how to reproduce the functionality - the code could be an entirely different language.
And - "commercially compete" with someone giving away code for free seems a non-concern.
@tbortels competition is one of the factors that go into what qualifies as fair use, so no, it is not a non-concern. And no, someone publishing their code with open access does not give it away for free wtf -
If you ask AI to rewrite the entirety of an open-source program, do you still need to abide by the original license? In philosophy, this problem is known as the Slop of Theseus
@lcamtuf When know that if you ask the CS department of the University of California, Berkeley and BSDi to rewrite the entirety of AT&T's Unix, the result does not need to abide by AT&T's original license.
BSD is prima facie a derivative work of AT&T Unix, not developed using a clean room approach, but instead carefully audited to remove all AT&T copyright and trade secret interests.
By the time Theseus' ship was ready, Linux had left the harbor.
-
@kevinr and proving that the AI was not trained on the original source will be pretty hard, because FLOSS programs with compatible licenses can legally copy code from one project into the other.
You’ll likely have to exclude all code from the project and all code that’s too similar from the training data. And then train an AI from scratch. Which would be extremely expensive.
@ArneBab @kevinr @SnoopJ @bgalehouse @lcamtuf I think it's more complicated. Consider program A licensed under GPL and program B licensed under BSD license. Code from program B can be copied into program A, but code from program A cannot be copied to program B without applying GPL to program B (changing the license). At least that's how it works as I understand it.
-
@ahltorp @bgalehouse @revk @lcamtuf @kevinr @rustynail
AI is a weird case as you could assert - probably correctly - that the original code may be part of its training corpus. Was that training a GPL violation? It's a stretch. Was it's training a copyright violation? Or was the AI (or rather its owners) exercising their GPL license rights? Or was it fair use under regular copyright?
Who knows?
It's a hot mess is what it is.
This is all so far outside the original reckoning of "it'd be nice if the bookbinder down the street didn't profit off of my work until I had a chance to profit off of it first" that it's not surprising it's a mess.
-
If you ask AI to rewrite the entirety of an open-source program, do you still need to abide by the original license? In philosophy, this problem is known as the Slop of Theseus
@lcamtuf@infosec.exchange If the AI has never seen the original code, neither in training data nor as part of a prompt, and if it is just rewriting the program based on the program's "API" (for a command line tool that would be the man page and the --help), then yes - Oracle vs Google definitely applies.
But as that one was quite narrow, I would assume that if even just the internal structure derives from the original program, it is too much. -
@ArneBab @kevinr @lcamtuf @bgalehouse
Fair use isn't something the GPL grants you. That's what I'm trying to work out - set the GPL aside for a moment.
Does regular copyright fair use give me the right to look at the freely provided source code, make a mental model, and re-implement a workalike if I don't re-use the original source?
Pretend it's just me and not an AI, because that throws a whole new set of confusion into the mix.
BSD did it against regular copyright. Not sure this is all that different.
@tbortels as far as I know, and as the article https://www.allaboutcircuits.com/news/how-compaqs-clone-computers-skirted-ibms-patents-and-gave-rise-to-eisa/ reinforces, fair use does not give you the right to re-implement the code.
Doesn’t matter whether you make a mental model as the intermediate step.
Only the clean room re-implementation gets out of that.
-
@ArneBab @lcamtuf @kevinr @bgalehouse
We entered a gray zone about 8 off-ramps ago. Copyright never anticipated self-replicating code on computers and viral licenses and clean-room re-implementations and AIs.
As for income - I've lost track of the original driver, but it's GPL'd free code, no?
I like fair use. It and parody are one of the very few things keeping us out of peasants-with-pitchforks-and-torches mode. If you eliminate those carve-outs, the whole system goes down.
@tbortels GPL’d means that you can generate income as long as you adhere to the license (⇒ keep changes free, too).
If you want to wiggle out of that requirement with a re-implementation, that’s where you enter the gray area, because if it is a violation of the GPL, then the permissions the GPL granted you no longer apply and you have to check against regular "all rights reserved".
-
@tbortels GPL’d means that you can generate income as long as you adhere to the license (⇒ keep changes free, too).
If you want to wiggle out of that requirement with a re-implementation, that’s where you enter the gray area, because if it is a violation of the GPL, then the permissions the GPL granted you no longer apply and you have to check against regular "all rights reserved".
@tbortels fair use is always risky, because it only gives you conditional rights: if you take something via the fair use exception, you cannot use the result in any circumstance that would not be considered fair use, too.
At least that’s my understanding of copyright and fair use. Differences between copyright in different countries adds a whole additional layer to that (there is no fair use in the EU, but there are "limitations and exceptions to copyright").
@lcamtuf @kevinr @bgalehouse -
@tbortels fair use is always risky, because it only gives you conditional rights: if you take something via the fair use exception, you cannot use the result in any circumstance that would not be considered fair use, too.
At least that’s my understanding of copyright and fair use. Differences between copyright in different countries adds a whole additional layer to that (there is no fair use in the EU, but there are "limitations and exceptions to copyright").
@lcamtuf @kevinr @bgalehouse@tbortels for parody there was the famous lawsuit of Erdogan vs. Böhmermann about the goat fucker poem where Böhmermann won (because of context and maybe also because the lawsuit of Erdogan provided the context which made the poem legal), but it is illegal to publish that poem outside of the context of the show (that explained which kinds of works actually are illegal and used that as an example), and the show cannot be published again, because context changed now.
@lcamtuf @kevinr @bgalehouse -
@ArneBab @kevinr @SnoopJ @bgalehouse @lcamtuf I think it's more complicated. Consider program A licensed under GPL and program B licensed under BSD license. Code from program B can be copied into program A, but code from program A cannot be copied to program B without applying GPL to program B (changing the license). At least that's how it works as I understand it.
@thebluewizard yes, the details are more complicated, but it doesn’t reduce the complexity of deciding which code has to be excluded.
-
@lcamtuf @kevinr @rustynail @ahltorp @bgalehouse @revk
If AI code cannot be copyrighted - you have no mechanism on which to force someone to accept the GPL, or any license. An AI artifact covered by GPL is meaningless.
-
@tbortels for parody there was the famous lawsuit of Erdogan vs. Böhmermann about the goat fucker poem where Böhmermann won (because of context and maybe also because the lawsuit of Erdogan provided the context which made the poem legal), but it is illegal to publish that poem outside of the context of the show (that explained which kinds of works actually are illegal and used that as an example), and the show cannot be published again, because context changed now.
@lcamtuf @kevinr @bgalehouse@tbortels The show with the poem ended with the legendary song that was later republished independently:
https://inv.nadeko.net/watch?v=HMQkV5cTuoY
https://www.youtube.com/watch?v=HMQkV5cTuoY -
If you ask AI to rewrite the entirety of an open-source program, do you still need to abide by the original license? In philosophy, this problem is known as the Slop of Theseus
@lcamtuf@infosec.exchange If Theseus asked his dad to curse a foss project because he was tricked by his cousin into believing the foss project is vibe coded, can the foss project be brought back to life?