I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools.

gabrielesvelto@mas.to

And it's crucial to remember what happened during the xz compromise: a chain of seemingly innocuous commits where malicious behavior was hidden, then triggered by changing a single character in a generated file. A SINGLE CHARACTER. If you truly believe you can catch that by manually reviewing thousands upon thousands of machine-generated commits obtained via black-box training data I'm sorry, but you're being extremely naive.

gabrielesvelto@mas.to

@adingbatponder yes, but why? Which packages where taking so long? Firefox has almost 4 millions of lines of Rust and it takes only a few minutes to build them.

a@852260996.91268476.xyz

@gabrielesvelto@mas.to it is also worth remembering that the xz incident happened WITHOUT LLMs involved, so you comparison is not a very good one

piegames@flausch.social

@gabrielesvelto "people are using this inadequate and problematic tool for a job, so let me suggest they use this different completely inadequate tool instead."
Speaking of unfortunate painful experience, using grep and sed at scale for mechanical refactoring very much randomly introduces mistakes into a codebase. I beg developers to use *at least* syntax-aware tools for mechanical refactoring jobs

gabrielesvelto@mas.to

@a how so? Now you don't need a person to run that particular exploit for years, you can just poison an LLM so that whenever someone generates a sufficiently large sequence of commits the exploit can be injected in them directly. No user intervention and it can be done at scale. And it can be done in closed-source codebases too, it's just a matter of someone using a bot on them.

a@852260996.91268476.xyz

@gabrielesvelto@mas.to you didn't need an LLM for xz, that is how

fourlastor@androiddev.social

@gabrielesvelto and ok, but what is the *actual* scenario you're imagining? because my coding tasks go as such when I use LLMs:
1. I have 10-15 classes that need to change the way we do X from Y to Z
2. I prompt the LLM, telling it "change A,B,C so that they use Z instead of Y"
3. I review the code, fixing mistakes as I see them
1/x because post length limits

fourlastor@androiddev.social

@gabrielesvelto
The code change is frankly pretty simple, we're talking of stuff on the level of "migrate Book so instead of using function calls, uses annotations for ABC, update the call sites", we're not talking about "change this complex piece of code so that it does complex ABC in another complex XYZ way". The realm of errors is "I know that Foo doesn't work well by itself and needs extra care"

fourlastor@androiddev.social

@gabrielesvelto anything that goes over the bar of "this is stupid but boring" goes into the "I'll do it by hand because if anything I need to learn how it works before touching it"

jwcph@helvede.net

@gabrielesvelto Just the other day I saw a goddamn professor claiming that we need to teach chatbots to reason in order for them to do math. As if we haven't had calculators that actually work every time for like 450 years. It's insane.

adingbatponder@fosstodon.org

@gabrielesvelto No clue. At the time it was chrome that pushed it into silly territory. But this was inside a flake. All I know was when it was refactored it was able to use 32 processors instead of only 2.

ruchirasdatta@mathstodon.xyz

@gabrielesvelto @a You are correct, LLMs have made this exploit many times easier to execute.

cliffsesport@mastodon.social

@gabrielesvelto that incident example of Metamorphic Malware?

silhouette@dumbfuckingweb.site

@a @gabrielesvelto no it's actually an extremely well-made point. if we were (almost) unable to detect something like that in a FOSS project (not in the code, in a debug object mind you) then where do we get off introducing the black box which increases complexity a thousand times and claim we can still quality-control the final product. not to mention it took someone years to gain influence within the project vs a model that just scrapes public code indiscriminately

a@852260996.91268476.xyz

@silhouette@dumbfuckingweb.site @gabrielesvelto@mas.to who said this already hadn't happened before the advent of LLMs? you detected ONE, you don't know how many you haven't

toast@donotsta.re

@silhouette @a @gabrielesvelto most people (by volume AND mass) using LLMs are doing so because they do not have the skills necessary to produce the code in question (they "have the skill to read it" but if you've ever tried reimplementing a compsci research paper without just copying their code as-is you know instinctively that's not the same thing), which means that they are unlikely to tell well-crafted malicious code from legitimate code, knowing that both achieve their results
this is implying they even do review it at all rather than simply relegate this to an agent that only checks if it matches the acceptance criteria (just like a real product manager!), which obviously immediately fails

silhouette@dumbfuckingweb.site

@a @gabrielesvelto I don't follow, are you agreeing with me or... what?

a@852260996.91268476.xyz

@silhouette@dumbfuckingweb.site @gabrielesvelto@mas.to I'm not, I'm saying that the xz is a bad example for several reasons, including the fact that (and this was my last point) it is one known case among an unknown number of total cases

silhouette@dumbfuckingweb.site

@a @gabrielesvelto I still don't follow your line of argument here. You are saying that there are currently an unknown number of potential vulnerabilities in human-generated FOSS code, so we should... hook it up to the complexity generator?

a@852260996.91268476.xyz

@silhouette@dumbfuckingweb.site @gabrielesvelto@mas.to The argument sounds more like "I know a guy who almost died for peanut allergy, so we should prohibit the peanut production". Yes it is possible. It was also possible in the past. My point is that the use of LLMs doesn't change much the landscape in that regard.

CIRCLE WITH A DOT

I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools.