Don't use LLM generated code in your projects yet!
-
There are only two strategies which are acceptable: either AI model output is completely illegal because of copyright stuff (this is unlikely to happen because there is now too much money behind it), or AI model output is fully in the public domain, which has its own problems but at least is an even playing field.
There won't be a middle ground that is safe. Because they want something that looks like a "middle ground", but really, all it does is lock in the big players' control over information, forever.
@cwebber I think we should resist socially and politically, for as long as there is a point, and until we figure out "benign LLMs". I'm pretty sure that's possible.
-
There are only two strategies which are acceptable: either AI model output is completely illegal because of copyright stuff (this is unlikely to happen because there is now too much money behind it), or AI model output is fully in the public domain, which has its own problems but at least is an even playing field.
There won't be a middle ground that is safe. Because they want something that looks like a "middle ground", but really, all it does is lock in the big players' control over information, forever.
@cwebber so, we will get a middle ground answer. Because what they actually want is to lock in the big player's control over information, forever. Just listen to Altman and his "we see intelligence as a utility that you will pay us for"
This is why Meta and Google are building fiber under the oceans. This is why Amazon wants to be all things to everyone. They want you locked in, they DO NOT LIKE the distributed power that the internet currently gives to indivuals.
-
That said, I think a lot of people think we can fight AI / LLM output on copyright grounds, and I actually think that's a losing strategy. Copyright almost always helps the big players, and it would here too!
You can see, they're already counting on and hoping it will be the case.
What the big players want is for copyright to apply to AI generated output because then *only* the big players can provide LLM services. See also Sam Altman's "running intelligence as a metered utility" pitch.
And the reason they could do this: *they* can make deals with Disney, Netflix, etc. But open models can't.
But what about all the "little guys" stuff? Well, when you sign that ToS on GitHub, Stack Overflow, DeviantArt, etc etc etc, all those places, you give them a right to your content too.
And THOSE places get to sell your rights.
So fighting on copyright grounds won't be an even playing field. It helps the big AI companies win.
@cwebber This agrees with my intuition on the matter -- the problem is not that content is being "stolen", it's that free AI "labor" "steals" the revenue that creators need in order to survive. For me, that points towards UBI, not reinforcing the highly unjust systems that trickle media revenue back to (a select few) creators.
(...speaking as a lifelong creator who almost made $5 playing live one time.)
-
That said, I think a lot of people think we can fight AI / LLM output on copyright grounds, and I actually think that's a losing strategy. Copyright almost always helps the big players, and it would here too!
You can see, they're already counting on and hoping it will be the case.
What the big players want is for copyright to apply to AI generated output because then *only* the big players can provide LLM services. See also Sam Altman's "running intelligence as a metered utility" pitch.
And the reason they could do this: *they* can make deals with Disney, Netflix, etc. But open models can't.
But what about all the "little guys" stuff? Well, when you sign that ToS on GitHub, Stack Overflow, DeviantArt, etc etc etc, all those places, you give them a right to your content too.
And THOSE places get to sell your rights.
So fighting on copyright grounds won't be an even playing field. It helps the big AI companies win.
@cwebber Now I feel dumb. This is basically what my concern has been - that a situation would arise where the regulatory or legal situation turns it into an oligopoly and destroy smaller software companies. Yet I didn’t consider use of the output as a harm to oss projects that use it (unless the code quality is bad) so I’ve been using it in a few oss repos of mine on the grounds my day job leaves me with insufficient time to do it all myself. And thinking it’ll get more expensive.
-
There are only two strategies which are acceptable: either AI model output is completely illegal because of copyright stuff (this is unlikely to happen because there is now too much money behind it), or AI model output is fully in the public domain, which has its own problems but at least is an even playing field.
There won't be a middle ground that is safe. Because they want something that looks like a "middle ground", but really, all it does is lock in the big players' control over information, forever.
@cwebber I’d settle for: if the models include licensed sources and use those without a license (proprietary or open source) then the model needs to be published openly and usage needs to be free.
-
Don't use LLM generated code in your projects yet! If for no other reason than that the legal case law is NOT ESTABLISHED YET.
I know there was the "copyright laundering" thing that went around a lot, but we actually don't know.
You'll see commenters everywhere on the internet say that "the US Supreme Court ruled that AI generated output is in the public domain". That's misinfo: they *declined to take on* a case from a lower court coming to that conclusion. The US Supreme Court hasn't yet ruled.
And this hasn't shaken out in an international setting yet either.
You may be surprised to hear: I actually think it's more dangerous and empowers centralized AI companies even more if it *isn't* the case that AI output is in the public domain (I'll follow up about that), but regardless, right now we just don't know.
But despite that, I'm STILL saying that you're putting yourself in legally dubious territory right now if you include LLM generated code, for now. We don't know yet.
@cwebber the US is not a country of laws, period. What USPTO says doesn't matter.
The EU however, just 3 days ago adopted text. LLM scammers MUST comply with licenses including payment to train on copyrighted work, regardless of location. And purely LLM generated slop *cannot be copyrighted*. There MUST be significant human contribution.
So purely LLM generated slop to try and license wash something is pretty much definitively unlawful now.
Protecting copyrighted work and the EU’s creative sector in the age of AI | News | European Parliament
To protect the creative sector in the EU, the use of copyrighted work by artificial intelligence requires transparency and fair remuneration, Parliament says.
(www.europarl.europa.eu)
-
@cwebber the US is not a country of laws, period. What USPTO says doesn't matter.
The EU however, just 3 days ago adopted text. LLM scammers MUST comply with licenses including payment to train on copyrighted work, regardless of location. And purely LLM generated slop *cannot be copyrighted*. There MUST be significant human contribution.
So purely LLM generated slop to try and license wash something is pretty much definitively unlawful now.
Protecting copyrighted work and the EU’s creative sector in the age of AI | News | European Parliament
To protect the creative sector in the EU, the use of copyrighted work by artificial intelligence requires transparency and fair remuneration, Parliament says.
(www.europarl.europa.eu)
@cwebber and remember, these are the dipshits pissing off the old companies that have infinite dollars by stealing *their* stuff. The people who spent millions turning copyright into a way to maintain monopolies and permanent rent-seeking.
The people who have used copyright as a weapon for many decades are decidedly not fans of 'companies' stealing the things they own to generate and sell things based on it.
And the LLM grifters absolutely do not have the money to pay them off. -
There are only two strategies which are acceptable: either AI model output is completely illegal because of copyright stuff (this is unlikely to happen because there is now too much money behind it), or AI model output is fully in the public domain, which has its own problems but at least is an even playing field.
There won't be a middle ground that is safe. Because they want something that looks like a "middle ground", but really, all it does is lock in the big players' control over information, forever.
I fully expect well funded companies to repeatedly challenge "AI cannot be copywritten because it wasn't human generated", and I expect it will be continually chipped away. That's going to make things stupidly complicated for a lot of non-technical reasons for a long, long time.
The advice I've given is to absolutely, and definitively denote exactly what code was AI generated keep detailed records of the history around it (including the source and date), because I guarantee that will become the crux of any future decision.
Until there's case law established, AI code is a liability.
-
R relay@relay.infosec.exchange shared this topic
-
Don't use LLM generated code in your projects yet! If for no other reason than that the legal case law is NOT ESTABLISHED YET.
I know there was the "copyright laundering" thing that went around a lot, but we actually don't know.
You'll see commenters everywhere on the internet say that "the US Supreme Court ruled that AI generated output is in the public domain". That's misinfo: they *declined to take on* a case from a lower court coming to that conclusion. The US Supreme Court hasn't yet ruled.
And this hasn't shaken out in an international setting yet either.
You may be surprised to hear: I actually think it's more dangerous and empowers centralized AI companies even more if it *isn't* the case that AI output is in the public domain (I'll follow up about that), but regardless, right now we just don't know.
But despite that, I'm STILL saying that you're putting yourself in legally dubious territory right now if you include LLM generated code, for now. We don't know yet.
@cwebber
It used to not be copyrightable. But considering nazi track the US is sliping on, the new copyright act prepared by Bezos and Thiel over a some blody drink will say:1) anything produced by humanity belong to whomever the tyrant wants, as we have it all in the LLM.
2) any royalties are going to us, see above. -
Don't use LLM generated code in your projects yet! If for no other reason than that the legal case law is NOT ESTABLISHED YET.
I know there was the "copyright laundering" thing that went around a lot, but we actually don't know.
You'll see commenters everywhere on the internet say that "the US Supreme Court ruled that AI generated output is in the public domain". That's misinfo: they *declined to take on* a case from a lower court coming to that conclusion. The US Supreme Court hasn't yet ruled.
And this hasn't shaken out in an international setting yet either.
You may be surprised to hear: I actually think it's more dangerous and empowers centralized AI companies even more if it *isn't* the case that AI output is in the public domain (I'll follow up about that), but regardless, right now we just don't know.
But despite that, I'm STILL saying that you're putting yourself in legally dubious territory right now if you include LLM generated code, for now. We don't know yet.
@cwebber In my opinion, the moment that personal information gets out in the public domain without proper consent , this becomes an actionable matter.
AI generated code must be open-source and doing this way, helps everybody to freely create.
The moment the $$$ gets in the picture, you are killing the true creativity potential of the people. -
There are only two strategies which are acceptable: either AI model output is completely illegal because of copyright stuff (this is unlikely to happen because there is now too much money behind it), or AI model output is fully in the public domain, which has its own problems but at least is an even playing field.
There won't be a middle ground that is safe. Because they want something that looks like a "middle ground", but really, all it does is lock in the big players' control over information, forever.
@cwebber did you read the copyright office opinion doc? What’s your take on what it says?
-
@cwebber I think we should resist socially and politically, for as long as there is a point, and until we figure out "benign LLMs". I'm pretty sure that's possible.
There is validity, with all kinds of different framing, to resisting the careless use of a complex and poorly understood technology as the answer to Life, the Universe, and Everything.
I think the thesis at hand though, is that trying to use outdated and inadequate, poorly fit-for-context copyright law as the tool (a technology, heh) to do that is not likely to be productive. It will consume our resources without meeting our purposes.
-
There is validity, with all kinds of different framing, to resisting the careless use of a complex and poorly understood technology as the answer to Life, the Universe, and Everything.
I think the thesis at hand though, is that trying to use outdated and inadequate, poorly fit-for-context copyright law as the tool (a technology, heh) to do that is not likely to be productive. It will consume our resources without meeting our purposes.
Part of the problem still being … what, exactly, IS our purpose in this melee?
-
There are only two strategies which are acceptable: either AI model output is completely illegal because of copyright stuff (this is unlikely to happen because there is now too much money behind it), or AI model output is fully in the public domain, which has its own problems but at least is an even playing field.
There won't be a middle ground that is safe. Because they want something that looks like a "middle ground", but really, all it does is lock in the big players' control over information, forever.
@cwebber The UK has a third option: the person operating the AI is the author and the output is copyrighted. Would not surprise me if the industry lobbies more jurisdictions into similar legislation.
-
@cwebber The UK has a third option: the person operating the AI is the author and the output is copyrighted. Would not surprise me if the industry lobbies more jurisdictions into similar legislation.
@MartyFouts Link to more info on UK case law?
-
Don't use LLM generated code in your projects yet! If for no other reason than that the legal case law is NOT ESTABLISHED YET.
I know there was the "copyright laundering" thing that went around a lot, but we actually don't know.
You'll see commenters everywhere on the internet say that "the US Supreme Court ruled that AI generated output is in the public domain". That's misinfo: they *declined to take on* a case from a lower court coming to that conclusion. The US Supreme Court hasn't yet ruled.
And this hasn't shaken out in an international setting yet either.
You may be surprised to hear: I actually think it's more dangerous and empowers centralized AI companies even more if it *isn't* the case that AI output is in the public domain (I'll follow up about that), but regardless, right now we just don't know.
But despite that, I'm STILL saying that you're putting yourself in legally dubious territory right now if you include LLM generated code, for now. We don't know yet.
@cwebber I'd be more concerned if someone can make a tool that can prove code came from a specific model but I don't think that's gonna happen either -
Don't use LLM generated code in your projects yet! If for no other reason than that the legal case law is NOT ESTABLISHED YET.
I know there was the "copyright laundering" thing that went around a lot, but we actually don't know.
You'll see commenters everywhere on the internet say that "the US Supreme Court ruled that AI generated output is in the public domain". That's misinfo: they *declined to take on* a case from a lower court coming to that conclusion. The US Supreme Court hasn't yet ruled.
And this hasn't shaken out in an international setting yet either.
You may be surprised to hear: I actually think it's more dangerous and empowers centralized AI companies even more if it *isn't* the case that AI output is in the public domain (I'll follow up about that), but regardless, right now we just don't know.
But despite that, I'm STILL saying that you're putting yourself in legally dubious territory right now if you include LLM generated code, for now. We don't know yet.
@cwebber I can see the future: legal concerns over LLM written code results in people rewriting code by hand to circumvent potential LLM code licence violations.
-
@MartyFouts Link to more info on UK case law?
@cwebber I don’t know of case law but the UK’s Copyright, Designs and Patents Act 1988, Section 9(3) states:
"In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken."
It’s language any legislature might be lobbied into inserting in their copyright statute.