CIRCLE WITH A DOT

phil@fed.bajsicki.com

@lproven@social.vivaldi.net @xs4me2@mastodon.social @reading_recluse@c.im
1. Paper from nearly 2 years ago. A lot has changed. Not to mention the 'test' the author (can't find their name, sorry) did is pretty dumb. It's much better to use an API, where you can control the full input pipeline to ensure the vendor isn't adding hidden instructions without your knowledge.
2. I already addressed the point in my previous comment - it's on the user to verify that tools have correct output. Relying on an LLM to do the reading in one's stead is a recipe for disaster.

You haven't said anything about YOUR use-case, experience, or the tests you tried.

I'm genuinely curious, what do you imagine using an LLM is like?

The reason I ask is because a lot of the criticism and panicking (sometimes crossing into outright disrespect and bigotry) I see online comes from an assumption that using an LLM is predicated on turning off one's brain and taking the output at face value... something that we shouldn't be doing with any software anyway.

I guess put another way: I don't believe that the problems people attribute to LLMs are specific to LLMs. How many instances were there where management/ execs took Excel output as fact, when the formulas were set up wrong?

These statistical models are no different.

phil@fed.bajsicki.com

@lproven@social.vivaldi.net @xs4me2@mastodon.social @reading_recluse@c.im
Hasn't been my experience. What have you tested it with?

Even tiny models in the 4-12B range have been able to handle the things I need (though granted, not as well as the 24-30B range).

My use-case is saving my hands from typing up repetitive patterns, analyzing my journals on several angles (e.g. what's my average mood based on the wording I use in my journals, how does that relate to some medical things like migraines, etc.) and as a parrot that'll repeat my plans/ calendar to me in different words, so I can overcome my own biases easier.

I have found the available models entirely sufficient for these tasks.

Not for coding, though. Even the Qwen3-Coder-Next, which is an 80B behemoth just plain sucks at code.

Now to be clear - I'm not saying they're always accurate when I use LLMs. I'm saying that because I use them with data I type up by hand and am intrinsically familiar with they save me time and mental effort, because spotting problems is easy.

I wouldn't use them for any subject which I'm not already well grounded in, and in that specific way, I agree with you.

But I also wouldn't say they're extremely or dangerously bad at digesting and exploring information, as such. Not moreso than code written by juniors without supervision.

Ultimately it's on the user to ensure the tool's output meets requirements.

Anecdotally, people aren't great at processing large amounts of information either. I work in infosec, and curate a rather complex inventory/risk/audit/reporting toolkit. I pull data from over a dozen critical systems and sub-systems, networks, etc, including vast amounts of hand-written documentation, guides and explanations about how all of this works together.

I'm still the only person capable of actually using the entire toolset in concert - not even going into further development/ integrations. Others rely on Cursor/ Claude Code to use them. And that's fine by me - I'd rather have tools that get used than tools that are entirely dependent on me.

I guess my point is that in this scenario the problem isn't LLMs themselves. The problem is people who don't take time to read and understand the requirements, input and output.

(Of course, this is putting aside the ethical/ political/ economic/ ecological problems, to keep this conversation more focused on the technical merits/demerits.)

CIRCLE WITH A DOT

phil@fed.bajsicki.com

Posts