Hypothesis: If the output of LLMs cannot be copyrighted, anything in the training set becomes public domain.
-
Hypothesis: If the output of LLMs cannot be copyrighted, anything in the training set becomes public domain.
If the recent attempt to LLM-rewrite chardet [1] holds up, then any copyrighted material can be laundered through an LLM.
Any inputs sent to an LLM for any purpose become part of the training set.
Thus, any company using LLMs has put their source code in the public domain.
But you also de-copyright research papers, the windows source code, and fintech?
-
Hypothesis: If the output of LLMs cannot be copyrighted, anything in the training set becomes public domain.
If the recent attempt to LLM-rewrite chardet [1] holds up, then any copyrighted material can be laundered through an LLM.
Any inputs sent to an LLM for any purpose become part of the training set.
Thus, any company using LLMs has put their source code in the public domain.
But you also de-copyright research papers, the windows source code, and fintech?
@jhlagado@jorts.horse here is another example of what you were just talking about, how LLM output has no owner, so can't be copyright protected or licensed.
-
R relay@relay.mycrowd.ca shared this topic
-
Hypothesis: If the output of LLMs cannot be copyrighted, anything in the training set becomes public domain.
If the recent attempt to LLM-rewrite chardet [1] holds up, then any copyrighted material can be laundered through an LLM.
Any inputs sent to an LLM for any purpose become part of the training set.
Thus, any company using LLMs has put their source code in the public domain.
But you also de-copyright research papers, the windows source code, and fintech?
@shapr not how it works. https://ansuz.sooke.bc.ca/entry/23 has a similar example:
-
Hypothesis: If the output of LLMs cannot be copyrighted, anything in the training set becomes public domain.
If the recent attempt to LLM-rewrite chardet [1] holds up, then any copyrighted material can be laundered through an LLM.
Any inputs sent to an LLM for any purpose become part of the training set.
Thus, any company using LLMs has put their source code in the public domain.
But you also de-copyright research papers, the windows source code, and fintech?
@shapr Saw a discussion about this recently with respect to surveillance tech. For instance, if you asked an LLM to create something similar, would it launder some facsimilie of the actual code?
-
@shapr not how it works. https://ansuz.sooke.bc.ca/entry/23 has a similar example:
@cceckman I'll read this, thanks for the link
-
@shapr Saw a discussion about this recently with respect to surveillance tech. For instance, if you asked an LLM to create something similar, would it launder some facsimilie of the actual code?
@dabeaz I had the idea to run tree-sitter on outputs like the "claude C compiler" and check the similarity at the AST level, but probably won't actually do this.
-
@shapr not how it works. https://ansuz.sooke.bc.ca/entry/23 has a similar example:
@shapr also the charset case, according to this article, appears to be: "the LLM-generated thing may be a derived work of the original". In my understanding,* the fact that the LLM-derived thing may not be copyrightable is irrelevant; it can still infringe.
If I make an audiobook of _Demon Queen_, and say "release it into the public domain"...that doesn't make it so, the work still infringes.
* (I am not a lawyer, this is not legal advice)
-
Hypothesis: If the output of LLMs cannot be copyrighted, anything in the training set becomes public domain.
If the recent attempt to LLM-rewrite chardet [1] holds up, then any copyrighted material can be laundered through an LLM.
Any inputs sent to an LLM for any purpose become part of the training set.
Thus, any company using LLMs has put their source code in the public domain.
But you also de-copyright research papers, the windows source code, and fintech?
@shapr "If the output of LLMs cannot be copyrighted" has load-bearing implications here, namely "and it's perfectly legal to rights-wash literally anything". That implication is unlikely and has yet to be proven. On top of that, rights-washing anything makes it unusable as training data due to LLM backfeeding problems. -
@jhlagado@jorts.horse here is another example of what you were just talking about, how LLM output has no owner, so can't be copyright protected or licensed.
Yes it is definitely public domain and therefore has no owner.
I suppose if you say take this program as your blueprint and let it vibe copy a clone then you've effectively converted the licence from restrictive to public domain.
This breaks even the most permissive open source licence.