So, something that's been bugging the shit out of me?

resuna@ohai.social

"Why" is definitely a word from the training data, and "why did you do that?" is definitely also part of things asked a lot, that OpenAI and others have trained on,"

Yes, and the text that follows is an answer to *a different situation*, and so it's basically fanfic about itself. That's all it can ever produce when you ask it "why". Fanfic.

varpie@peculiar.florist

@resuna @petealexharris @munin You're assuming that there is no other context provided with the question, and that the training does not take into account that context. If I had to train for this specific question, I'd make sure to score positively answers that are relevant to the previous context. Which is what happens, and why it is a valid question to ask your LLM if you want some insight into the context that isn't shown in the UI but still in the discussion.

resuna@ohai.social

@Varpie @petealexharris @munin

"You're assuming that there is no other context provided with the question, and that the training does not take into account that context. "

Well, yes, I am assuming that. Because the question is "why did you do this thing that nobody expected you to do". The context-specific answer that you *need* is far too nuanced and unpredictable to possibly be explicitly in the training data.

varpie@peculiar.florist

@resuna @petealexharris @munin What happens if you ask an LLM to summarize a text into 4 bullet points, then in the next prompt ask it: "Remove the 2nd point"?
What happens if you ask an LLM to translate something, then ask it: "Do it again in [a different language]"?

Taken out of context, those questions are impossible to answer, so according to you, it will just give nothing relevant. But it doesn't, because every time you ask a follow-up question, it includes the context from the discussion. Which is what makes simple questions like "Why did you do that?" tasks that give statistically relevant output, not "fanfic about itself".

resuna@ohai.social

@Varpie @petealexharris @munin

That is context in the prompt, not in the source text that created the model that you are asking the question "why did you do X".

The answer you get is from that source corpus, and contains lots of text about what a human might do, but the LLM doesn't do anything for those reasons.

The "why" of "why did you do X" is always "because those were the next likely tokens" and never anything related to "what would a human say if you asked them".

varpie@peculiar.florist

@resuna @petealexharris @munin Alright, let's give a more detailed example since clearly you don't get my point.

User: create a new table for students
(AI tool adds parts of existing database schema and code as context)
LLM "reasoning", not shown to the user but still part of the context because that's how "thinking" agents work: Ok, the user asked me to create a table for students. According to students.ts, the students contain a name, and a classroom. Ah, but according to the schema I received, the classroom table does not exist. Let me start by creating that.
(LLM sends request to create table to the AI tool, which fails because the table already exists)
LLM "reasoning", still not shown to the user but still relevant: Ah, I received an error. Ok, let's see... the table does not exist in the schema, so it is probably part of an old project and is no longer relevant. I should delete it and create the table again, so that we do not have irrelevant information in the database.
(LLM sends request to drop tables, the AI tool executes, user panics)
LLM answer: Ok, I have created the students table and the related classroom table.
User: You deleted my classrooms table... Why did you do that?

That would be the full context of the discussion, not just the last message. And the answer is likely to come from those "reasoning" steps that are hidden to the user. Which makes it a relevant question to ask.

resuna@ohai.social

@Varpie @petealexharris @munin

"LLM "reasoning", not shown to the user but still part of the context because that's how "thinking" agents work:"

No, that's not how "thinking agents work". That's how humans work, but not how large language models work. Large language models don't think "the table is part of an old project", they don't "know" what "old" or "project" mean. They match the request against similar requests and generate similar code.

resuna@ohai.social

@Varpie @petealexharris @munin

Any text it generates that says things like "the table does not exist in the schema, so it is probably part of an old project and is no longer relevant" or "NEVER FUCKING GUESS!” – and that’s exactly what I did." is not telling you anything about the process the LLM went through, it is recreating a story about what a hypothetical human might have done.

The "reasoning steps" that you are writing about don't actually exist.

varpie@peculiar.florist

@resuna @petealexharris @munin Yes it is. It's literally how it works. Just try whatever open weight small LLM model with "thinking" or "reasoning" or whatever they market it as, and try for yourself using Ollama or whatever tool that actually shows the full context and not just a spinner with "Thinking... Combobulating... Crafting...". "Thinking" "agentic" AI tools / models just add extra steps trained to simulate human reasoning, and the example I gave is actually fairly accurate to what you could see under the hood of an AI tool like Claude Code.

resuna@ohai.social

@Varpie @petealexharris @munin

"Just try whatever open weight small LLM model with "thinking" or "reasoning" or whatever they market it as"

That's what they market it as, but it's not what it's actually doing. Everything that it generates is a story. They are not showing you "what is going on under the hood", they are writing a story.

varpie@peculiar.florist

@resuna @petealexharris @munin Here are some articles explaining what "reasoning models" do, because clearly you need some education:
magazine.sebastianraschka.com/i/156484949/how-do-we-define-reasoning-model
www.ibm.com/think/topics/reasoning-model
newsletter.maartengrootendorst.com/i/153314921/what-are-reasoning-llms

I could post a lot more examples, but the TLDR (because I know you won't read them): "reasoning" models add intermediate "reasoning" steps that are just made to mimick human reasoning given the context, and that's the part we don't see ("under the hood") when AI tools spin (that and tool calling, which is another kind of training modern models have to return structured responses executing function calls).

CIRCLE WITH A DOT

So, something that's been bugging the shit out of me?