@bagder That reinforces my suspicions that there was a breakthrough for security at the start of the year, and that the rest of the year will be more quiet.
varpie@peculiar.florist
Posts
-
#Mythos finds a #curl vulnerability -
So, something that's been bugging the shit out of me?@resuna @petealexharris @munin Here are some articles explaining what "reasoning models" do, because clearly you need some education:
magazine.sebastianraschka.com/i/156484949/how-do-we-define-reasoning-model
www.ibm.com/think/topics/reasoning-model
newsletter.maartengrootendorst.com/i/153314921/what-are-reasoning-llms
I could post a lot more examples, but the TLDR (because I know you won't read them): "reasoning" models add intermediate "reasoning" steps that are just made to mimick human reasoning given the context, and that's the part we don't see ("under the hood") when AI tools spin (that and tool calling, which is another kind of training modern models have to return structured responses executing function calls). -
So, something that's been bugging the shit out of me?@resuna @petealexharris @munin Yes it is. It's literally how it works. Just try whatever open weight small LLM model with "thinking" or "reasoning" or whatever they market it as, and try for yourself using Ollama or whatever tool that actually shows the full context and not just a spinner with "Thinking... Combobulating... Crafting...". "Thinking" "agentic" AI tools / models just add extra steps trained to simulate human reasoning, and the example I gave is actually fairly accurate to what you could see under the hood of an AI tool like Claude Code.
-
So, something that's been bugging the shit out of me?@resuna @petealexharris @munin Alright, let's give a more detailed example since clearly you don't get my point.
User: create a new table for students
(AI tool adds parts of existing database schema and code as context)
LLM "reasoning", not shown to the user but still part of the context because that's how "thinking" agents work: Ok, the user asked me to create a table for students. According to students.ts, the students contain a name, and a classroom. Ah, but according to the schema I received, the classroom table does not exist. Let me start by creating that.
(LLM sends request to create table to the AI tool, which fails because the table already exists)
LLM "reasoning", still not shown to the user but still relevant: Ah, I received an error. Ok, let's see... the table does not exist in the schema, so it is probably part of an old project and is no longer relevant. I should delete it and create the table again, so that we do not have irrelevant information in the database.
(LLM sends request to drop tables, the AI tool executes, user panics)
LLM answer: Ok, I have created the students table and the related classroom table.
User: You deleted my classrooms table... Why did you do that?
That would be the full context of the discussion, not just the last message. And the answer is likely to come from those "reasoning" steps that are hidden to the user. Which makes it a relevant question to ask. -
So, something that's been bugging the shit out of me?@resuna @petealexharris @munin What happens if you ask an LLM to summarize a text into 4 bullet points, then in the next prompt ask it: "Remove the 2nd point"?
What happens if you ask an LLM to translate something, then ask it: "Do it again in [a different language]"?
Taken out of context, those questions are impossible to answer, so according to you, it will just give nothing relevant. But it doesn't, because every time you ask a follow-up question, it includes the context from the discussion. Which is what makes simple questions like "Why did you do that?" tasks that give statistically relevant output, not "fanfic about itself". -
So, something that's been bugging the shit out of me?@resuna @petealexharris @munin You're assuming that there is no other context provided with the question, and that the training does not take into account that context. If I had to train for this specific question, I'd make sure to score positively answers that are relevant to the previous context. Which is what happens, and why it is a valid question to ask your LLM if you want some insight into the context that isn't shown in the UI but still in the discussion.
-
So, something that's been bugging the shit out of me?@munin @petealexharris Sure, I'll go touch some grass and talk to my therapist about this philosophical horseshit
-
So, something that's been bugging the shit out of me?@petealexharris I totally agree with you. And that is also a very different take from the beginning of the discussion, where Fi said that querying LLMs for "why" it does something is "thrice-divorced from reality" and "fucking delusional" and that people doing that should "touch some grass and get a fucking therapist"...
-
So, something that's been bugging the shit out of me?@petealexharris @munin You misread me. Whether the model "understands" the question is a philosophical question. The non-philosophical question of whether it can give a useful answer is the relevant part, and my whole point is that pointing at the philosophical aspect to belittle people that look at the practical part, assuming that they don't understand it, is dumb.
-
So, something that's been bugging the shit out of me?@petealexharris @munin "Why" is definitely a word from the training data, and "why did you do that?" is definitely also part of things asked a lot, that OpenAI and others have trained on, so my point still stands that it is a valid question to ask. Whether the model "understands" the question is just a philosophical question that is irrelevant for the fact that it is a useful question. Of course if you're using it in Prod and it deletes your DB and you think it understands and can improve itself, there are plenty of things you'd need to be corrected on, but saying that everyone asking that question is delusional is just wrong.
-
So, something that's been bugging the shit out of me?@petealexharris @munin When you ask an LLM "why is the sky blue?", it is statistically likely to give a correct answer. It still works the same way, computing probabilities of what the next token is, but the "why" has a semantically significant weight that influences the output, so it is an important keyword. It doesn't have to "understand" it, it just has to be trained in a way that makes it significant. You don't have to believe that it understand things to know that it is trained on human language and will behave correctly when fed human language.
-
So, something that's been bugging the shit out of me?@petealexharris @munin Where did I say they'd understand a better prompt?
-
So, something that's been bugging the shit out of me?@petealexharris @munin Clearly you've never tried it yourself...
-
So, something that's been bugging the shit out of me?@munin Well... As you mentioned, each generation "reads the prompt and any cache, if they exist, from prior session", and since they were trained on "explaining" their previous outputs to sound like a relevant discussion, asking why a model gave a specific output isn't as stupid as you make it out to be, as it can give some input on the "thinking" part of the previous output that is usually not directly visible. That can then be used to tweak prompts and add some guardrails (even though there is a fairly long list of examples of guardrails not being fully effective). Of course, the first problem is giving access to prod to an unreliable system...
-
#TIL you can create multiple folders under a directory with a single command.@box464 You can even have that in the middle of the path, so you could have written it as
mkdir -p cool-fedi-project/{postgres,redis}-data -
The AI slop security reporting is basically extinct.@bagder Didn't you share one just 2 days ago though? hackerone.com/reports/3669305
-
I have deeply mixed feelings about #ActivityPub's adoption of JSON-LD, as someone who's spent way too long dealing with it while building #Fedify.@hongminhee I have the same feeling. The idea behind JSON-LD is nice, but it isn't widely available, so developing with it becomes a headache: do I want to create a JSON-LD processor, spending twice the time I wanted to, or do I just consider it as JSON for now and hope someone will make a JSON-LD processor soon? Often, the answer is the latter, because it's a big task that we're not looking for when creating fedi software.
-
I’ve been asked on TV hits and interviews lately to explain why decentralized social media is better, especially re: Mastodon.@taylorlorenz Traditional social media is built to drive engagement and be addictive, making them a big part of Gen Z's social difficulties (they are literally being sued for it). Decentralized platforms without such algorithms bring back the feeling of community, providing a more tailored experience (choose your software, Mastodon, GoToSocial, Iceshrimp, etc.) without blocking connections. You can join a server based on your personal preferences (language, geolocation, hobbies...), and if you find a better place, you can freely move, without losing your connections.