If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

iwein@mas.to

@nuintari thanks for that

ahimsa_pdx@disabled.social

@pseudonym
Looks like Harvard Business Review agrees with you

AI Doesn’t Reduce Work—It Intensifies It

One of the promises of AI is that it can reduce workloads so employees can focus more on higher-value and more engaging tasks. But according to new research, AI tools don’t reduce work, they consistently intensify it: In the study, employees worked at a faster pace, took on a broader scope of tasks, and extended work into more hours of the day, often without being asked to do so. That may sound like a win, but it’s not quite so simple. These changes can be unsustainable, leading to workload creep, cognitive fatigue, burnout, and weakened decision-making. The productivity surge enjoyed at the beginning can give way to lower quality work, turnover, and other problems. To correct for this, companies need to adopt an “AI practice,” or a set of norms and standards around AI use that can include intentional pauses, sequencing work, and adding more human grounding.

Harvard Business Review (hbr.org)

I did not read the whole thing but summary says

"One of the promises of AI is that it can reduce workloads so employees can focus more on higher-value and more engaging tasks. But according to new research, AI tools don’t reduce work, they consistently intensify it ..."

toscalix@mastodon.social

@pseudonym

ahimsa_pdx@disabled.social

@JizzelEtBass
Thanks ️

pseudonym@mastodon.online

@JizzelEtBass @ahimsa_pdx

Yeah. Pretty sure I read that earlier and it influenced my thinking about this, leading to my post.

Thanks for the reference.

pseudonym@mastodon.online

@wendynather

Please do.

Glad it had some value.

Just my late night noodling about things.

pseudonym@mastodon.online

@ferricoxide

Same background (Unix grey beard) with current focus on security, and your experience matched my own.

I was soaking in a lot more AI tools at last job, and experience and insight are key.

Recently I had a system suggest multiple times to do it "the easy way" which emphatically was not how I wanted it to work. I was able to gently guide it back to what I wanted.

Letting a senior dev do the work of a senior guiding a junior is about right. But still can't replace either.

pseudonym@mastodon.online

@toldtheworld

The models may indeed get better at finding and fixing their own mistakes, and would not be subject to human fatigue, that's true. But it is never perfect, so you still need a human in the loop. You've just pushed back the time a bit before you missed a harder-to-detect error. Which is inevitable, because hallucinations / confabulations are a feature, not a bug, of essential LLM operations.

So you make more, faster, harder to spot errors. Better LLM checkers increase the risk.

pseudonym@mastodon.online

@deborahh @mayintoronto

Yup. This is my biggest structural concern, really. But I only had 500 characters to consider the previous post, and wanted to focus on the review cost of any "gains" one might have.

There are more related topics to discuss, but the breaking of the funnel to train the next generation of skilled people is huge.

pseudonym@mastodon.online

@max

Thanks for the reference. Didn't know that one.

pseudonym@mastodon.online

@wronglang @xrisk @malstrom

Correct. They don't learn concepts. That's the key confusion in so much of the discussion and use around them.

They have no world model, and don't reason at all. But they perform a very good facsimile of reasoning, because reasoning is embedded in and has shaped the patterns of speech, text, and code.

They pattern match. That's all. Full stop. But they do it so well it looks like speech, or code, or understanding.

pseudonym@mastodon.online

@Moutmout

Good example I hadn't thought of.

Yes, human novice code mistakes have a "shape" to them a teacher can recognize quickly, or suspect because of how the error manifests.

These are different classes of "good looking" failures.

leftpaddotpy@hachyderm.io

@pseudonym i think it depends on the domain. like, code review is not seriously expected to catch all bugs; it's merely a step in a process. if you need absolute correctness (most don't!) then formal methods, a shockingly rare practice in the most critical industries, might be the right choice.

a stronger argument would be "the bugs are less obvious" though i think that too can be fought with observability. but that strategy only works well in application code, i.e. code which "makes money" (a notion which should be challenged, but that's another issue), rather than infra layer stuff with higher correctness needs and worse observability. and you know how the old saying goes: "if the code is good it's probably not making money". idk, people write slop where they already wrote slop due to the same pressures as before.

pseudonym@mastodon.online

@ainmosni

Yup.

My thoughts aren't new.

Just felt the need to to pack them up into something bite-sized.

To explain where I see one of the fundamental design failures, as a function of even any potential "good stuff" that may arise.

a_goodall_spaceship@norden.social

@adrianmorales @pseudonym Stop that, I love dark star!

michael@westergaard.social

I like to say that LLMS are a great way to reduce junior development time at the cost of senior review time.

CIRCLE WITH A DOT

If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

AI Doesn’t Reduce Work—It Intensifies It