We knew, but the proof is nice.
-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
@davidaugust interesting. Had to ask. Already fixed?

-
@glitzersachen @scottjenson @xdydx guessing you are joking. But also suspect it may be an inside joke with not a lot of folks on the inside.
@davidaugust @scottjenson @xdydx
True. See @xdydx 's reply.
-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
Shortcut to paper: https://arxiv.org/pdf/2410.05229
-
@alisynthesis @davidaugust fair enough. I changed up the problem completely and added some reasoning and it did pretty well. It appears to be generating code to solve the math. The only thing it missed is that very unripe bananas are green, not yellow.
James picks 40 apples on Monday. Then he picks 35 lemons on Tuesday. On Wednesday, he picks half as many bananas as he did apples, but five of them were very unripe. How many yellow fruits does James have?


@audioflyer79 @alisynthesis @davidaugust how does it do if you swap the colors of the fruit?
-
@davidaugust AGI is coming son 🤭
@pascal_le_merrer any day now. I hear potus say in two weeks.
-
@davidaugust interesting. Had to ask. Already fixed?

@flq yes, many systems have tools and/or abilities built in to take over basic math operations that simpler LLMs failed at.
The salient and enduring issue, I think, is that the spin and marketing of LLMs as "understanding," "thinking" or "intelligent" (as those words typical meanings suggest) remains largely fictional.
-
@joriki it’s from August.
October 2024
Apple Engineers Show How Flimsy AI ‘Reasoning’ Can Be
The new frontier in large language models is the ability to “reason” their way through problems. New research from Apple says it's not quite what it's cracked up to be.
WIRED (www.wired.com)
-
@drifthood @davidaugust This makes me think of "Clever Hans", the horse that appeared to do arithmetics but actually just responded to involuntary human cues:
https://en.wikipedia.org/wiki/Clever_Hans -
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
@davidaugust Of course an LLM cannot do math, but to be honest, that is also not what they're designed for. An LLM these days like Claude knows that it should take a calculator and type the equation in there, instead of hallucinating an answer. Complaining that an LLM can't do math is like complaining a screwdriver can't drill a hole.
You can counter that there are plenty of people who are using the screwdriver to drill the hole, but that is not on the tool, that is on the user.
-
@davidaugust Of course an LLM cannot do math, but to be honest, that is also not what they're designed for. An LLM these days like Claude knows that it should take a calculator and type the equation in there, instead of hallucinating an answer. Complaining that an LLM can't do math is like complaining a screwdriver can't drill a hole.
You can counter that there are plenty of people who are using the screwdriver to drill the hole, but that is not on the tool, that is on the user.
@davidaugust When did they do this test? I tried it with the following LLMs: Sonnet 4.6, Codex 5.3, GPT-5.4, GPT-5-Mini and Kimi-K2.5. They all answer the kiwi question correctly.