We knew, but the proof is nice.
-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
Don't let @scottjenson catch you disseminating defeatist news on AI.
It's utterly your fault that we have this bad reputation on the Fedi with respect to AI.
-
Don't let @scottjenson catch you disseminating defeatist news on AI.
It's utterly your fault that we have this bad reputation on the Fedi with respect to AI.
@glitzersachen @scottjenson @xdydx guessing you are joking. But also suspect it may be an inside joke with not a lot of folks on the inside.
-
@glitzersachen @scottjenson @xdydx guessing you are joking. But also suspect it may be an inside joke with not a lot of folks on the inside.
Actually, this particular joke has the attention of quite a few people..
Scott Jenson (@scottjenson@social.coop)
OK, this is going even MORE sideways so I need to make a few things clear: 1. I took a complex point and made it poorly 2. My goal was to ask for more inclusiveness 3. I am sickened by what happend to BlackTwitter and I don't want it recur 4. But I can't speak for BlackTwitter nor should I 5. I apologize to black mastodon users for making such a poor comparison 6. I'm not endorsing "AI Slop" they were a foil to make my point 7. I'm certainly NOT trying to compare AI bros to Black twitter (but, as I said, I can see how people made that connection. I'm trying to correct that here)
social.coop (social.coop)
-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
@davidaugust Well, there have actually been successes by connecting LLMs to proof assistant and computer algebra programs. As this post rightly puts, the LLM is not capable in itself to perform computations reliably, but it can write commands sent to the computer algebra programs, or proof candidates sent to the proof assistant; which can answer that the proof is incorrect, and the process goes on until a correct proof is produced.
See also uses by pro mathematicians:
https://bsky.app/profile/wildverzweigt.bsky.social/post/3miua4ulxhk2fAlso see Terence Tao
-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
@davidaugust Direct link to the paper https://arxiv.org/pdf/2410.05229 (presented at ICLR 2025).
Seems not to be a very recent news, then.
-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
@davidaugust In about 80 years we've gone from a room full of computers the size of refrigerators that were good at crunching numbers but not much else to computers the size of corporate office parks that can draw almost-convincing pictures of people with five fingers (and thumbs, too!) but can't do elementary school math.
And some people call this progress.
-
@davidaugust Direct link to the paper https://arxiv.org/pdf/2410.05229 (presented at ICLR 2025).
Seems not to be a very recent news, then.
@Sobex it’s from August.
-
@drifthood yes, there does seem to be a threshold over which in some respects only humans cross over to one side.
I see that sort of begging in a dog. He wants the treat, so instead of just doing the desired behavior the human command is asking for, he tries every response that has ever gotten him a treat until he “unlocks” the treat. Humans can and do do this too from time to time, but humans _also_ actually communicate and understand from time to time as well.
-
not new, here's the 2024 paper referenced:
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Abstract page for arXiv paper 2410.05229: GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
arXiv.org (arxiv.org)
@joriki it’s from August.
-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
@davidaugust Ecosia AI gets it right. It looks like the paper referenced was published in 2025, so the research conducted prior. The models are all much better now. I’m no AI apologist, but I think any argument of “AI sucks because it’s not good at _____” is on tenuous ground and will be proven wrong as the models continue to improve. @Ecosia

-
@davidaugust Ecosia AI gets it right. It looks like the paper referenced was published in 2025, so the research conducted prior. The models are all much better now. I’m no AI apologist, but I think any argument of “AI sucks because it’s not good at _____” is on tenuous ground and will be proven wrong as the models continue to improve. @Ecosia

@audioflyer79 @davidaugust I mean, it's worth noting that the LLMs have ingested that paper by now. : /
-
@audioflyer79 @davidaugust I mean, it's worth noting that the LLMs have ingested that paper by now. : /
@alisynthesis @davidaugust fair enough. I changed up the problem completely and added some reasoning and it did pretty well. It appears to be generating code to solve the math. The only thing it missed is that very unripe bananas are green, not yellow.
James picks 40 apples on Monday. Then he picks 35 lemons on Tuesday. On Wednesday, he picks half as many bananas as he did apples, but five of them were very unripe. How many yellow fruits does James have?


-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
Amo Bishop Rodent (@pikesley@mastodon.me.uk)
"We made the computers, the notoriously accurate calculating machines, worse at arithmetic. This is surely progress along the path to creating Computer God"
mastodon.me.uk (mastodon.me.uk)
-
@lemgandi
The wetness of water has been hotly debated, as to some wet means "covered with or soaked in water", and it's questioned whether water is covered with itself.
@davidaugust -
@davidaugust In about 80 years we've gone from a room full of computers the size of refrigerators that were good at crunching numbers but not much else to computers the size of corporate office parks that can draw almost-convincing pictures of people with five fingers (and thumbs, too!) but can't do elementary school math.
And some people call this progress.
@Karen5Lund Maybe because people stopped writing efficient code about 20 years ago?
-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
@davidaugust AGI is coming son 🤭
-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
@davidaugust interesting. Had to ask. Already fixed?

-
@glitzersachen @scottjenson @xdydx guessing you are joking. But also suspect it may be an inside joke with not a lot of folks on the inside.
@davidaugust @scottjenson @xdydx
True. See @xdydx 's reply.
-
We knew, but the proof is nice.
"Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"
The guess-the-next-words machines don’t actually understand anything.
Shortcut to paper: https://arxiv.org/pdf/2410.05229
-
@alisynthesis @davidaugust fair enough. I changed up the problem completely and added some reasoning and it did pretty well. It appears to be generating code to solve the math. The only thing it missed is that very unripe bananas are green, not yellow.
James picks 40 apples on Monday. Then he picks 35 lemons on Tuesday. On Wednesday, he picks half as many bananas as he did apples, but five of them were very unripe. How many yellow fruits does James have?


@audioflyer79 @alisynthesis @davidaugust how does it do if you swap the colors of the fruit?