Yesterday, I boosted something that maybe I shouldn't have, which made at least one OSS maintainer feel personally attacked.

glyph@mastodon.social

Yesterday, I boosted something that maybe I shouldn't have, which made at least one OSS maintainer feel personally attacked. I want to make it clear in case anyone else felt the same:

I think that LLM policies should be short & clear: "Do not use LLMs when working on this project. Don't submit code or comments using them."

I am not angry with people struggling to navigate this topic differently than I am.

glyph@mastodon.social

If you think that your policy needs to allow for the fact that people are going to do it anyway, we disagree on this very specific nuance. That doesn't mean I think you're bad or stupid.

More generally, I understand that the entire community, and the entire industry is basically in a crucible right now and that we are all stressed out of our minds by unhinged levels of slop spam, executive mandates, not to mention *gesturing at everything else going on in the world*.

glyph@mastodon.social

I'm glad someone reached out and if you ever think I've fucked up by posting (or boosting) something that personally made you feel bad, feel free to DM or email. I have very strong feelings and I will continue to make them known but my goal is not to be hurting people who are doing their best here, and if there is collateral damage I would like to mitigate it as best I can.

miss_rodent@girlcock.club

@glyph Honestly, I'm surprised more projects haven't banned it over just the licensing concerns at this point.
No copyright -> LLM generated code not eligible for licensing, unless your project is public domain/public domain-equivalent licensed, then LLM code creates a looming legal clusterfuck. And LLMs trained on GPL code might be *generating* license violations (and legal liability) any time you run them.

rataunderground@neopaquita.es

@glyph In an ideal world, everyone would follow the submission policy prohibiting contributions made with AI, but how can we verify if that is the case? Unfortunately, a policy that cannot be enforced is not very useful, apart from being a symbolic gesture. In fact, by not being able to guarantee that contributions do not come from an LLM, such a policy could be misleading users by promising them that there is a way to verify this, when this is not the case.
I hate this whole predicament that LLMs have put free software in.

glyph@mastodon.social

@miss_rodent As it happens, the person who felt attacked here has written a strong policy which explicitly accounts for all of that and makes quite a strong case against it, but I boosted something which used quite aggressive language to denigrate one of the minor caveats that they included.

glyph@mastodon.social

@miss_rodent I didn't even really see it as criticizing their specific choices, but if you've sweated blood to write a policy which attempts to deal with a coalition of very angry and diametrically opposed factions I can understand that if somebody calls one of the choices you made in that process stupid, you're going to have a strong reaction.

brib@disabled.social

@Rataunderground @glyph

IMO you can't verify every line of code but you can verify if the PR/commit text has signs of slop and if the submitter doesn't seem to understand what they're coding. That gets a decent chunk of the way.

The rest is enforced via social expectation and the potential consequences of being kicked out if you don't meet those expectations

glyph@mastodon.social

@Rataunderground To rephrase the thing I boosted but without calling anyone names this time: a policy can acknowledge the limits of its own enforceability without allowing for those violations.

For example, even the most lawyer-tested submission-licensing policy has language like "By submitting this code, you certify that you have all the relevant rights to license it to the project under our terms." There's never been (and can never be) a way to enforce that mechanically.

aeva@mastodon.gamedev.place

@glyph I'm torn between wanting to put up a sign that says "If you do this I will hurt your feelings, block you, and report your account as spam" and knowing that stuff like that invites adversarial behavior from people and also I'd rather just start with something to the effect of "remember the golden rule: do not make me have to make new rules" and only add rules to projects as needed (I don't think I've ever gotten an unsolicited pull request from a stranger on any of my projects yet)

miss_rodent@girlcock.club

@glyph Yeah, it makes sense. I'm def on your side of thinking the correct policy is just "No". Even if you don't care about the code quality or maintainability concerns, and don't care about the environment being burned down over it, and don't care about the plagarism, the potential legal/license concerns alone make it seem like a bad idea to allow uncopyrightable code into a project that depends on protections from licensing the copyright.

glyph@mastodon.social

@aeva one irony here is that most of MY projects do not have any such policy in place because I am (for whatever reason) just not on the receiving end of much spam; the offended party here has written what I would otherwise consider very good and detailed policies and thus done more work than I have. Perhaps people just know my feelings from external channels such as this one.

snoopj@hachyderm.io

@glyph @aeva I would guess that the bar for "having an idea of what might be even a hypothetically useful change" is higher in your projects than the average, filtering out a lot of people who aren't going to bother to understand them in the first place.

(okay, I'm mostly thinking about Twisted, but)

glyph@mastodon.social

@Rataunderground A policy could simply require similar self-certification that they haven't used any LLM tools in the process. It is simultaneously true that:

1. There's a large swathe of coding work that is rote and tedious and could probably be done with LLM tools totally undetectably if someone were motivated to violate this policy.

2. The sort of person who would actively seek to do that would almost certainly, eventually, leave *very* obvious evidence of their violations.

glyph@mastodon.social

@SnoopJ @aeva there's also, like, a popularity threshold thing. Twisted is close to the top of the popularity pile for the stuff I maintain and it's already relatively obscure. People standing directly in the "core infrastructure" line of fire have to deal with a lot more inbound even in the Before Times, let alone now

snoopj@hachyderm.io

@glyph @aeva yea, very true as well

miss_rodent@girlcock.club

@glyph (not that a policy of "no" can stop all of it -- obviously plagarism and license violations are already illegal and already got into projects and cause problems.
But at least a policy of "No" means a lot fewer LLM things will be submitted, you can point to the policy to reject or remove the ones that get through, to blacklist violators of that policy from further contributions, etc. etc. - it gives you a filter on one side, and options to handle policy violations on the other.)

glyph@mastodon.social

@miss_rodent I tend to think that the copyright concerns are both
A) real, and
B) overblown.

There's a fair amount of case law at this point that indicates that you can mix in a small amount of human creativity to create something copyrightable. In the context of coding, particularly of open source, the "raw" LLM outputs are generally not even accessible, given that a bunch of human creative choices go into what to submit, and the project as a whole has an umbrella of human creativity generally

glyph@mastodon.social

@miss_rodent But, to take the recent example of chardet:

While I do not like Bruce Perens's opinion on the copyright status of v7, I think he is *probably* correct on the merits under current jurisprudence. The "clean room" implementation is *probably* going to be ruled either uncopyrightable (public domain) or copyrighted (MIT license) by the "new author" by dint of a few minor choices around its submission and structure.

HOWEVER…

miss_rodent@girlcock.club

@glyph I think the more pressing concern is their tendancy to produce exact or near-exact copies of training code (I linked a study a few days ago about it ... here https://girlcock.club/@miss_rodent/116190673809741664 ) which ... it's kind of an open question still, how much of a legal liability it is, but, there is enough uncertainty about it that if I were running a donation-funded project, I wouldn't want to risk the legal fees over it until someone with money sets stronger precedents about it.

CIRCLE WITH A DOT

Yesterday, I boosted something that maybe I shouldn't have, which made at least one OSS maintainer feel personally attacked.