Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. We knew, but the proof is nice.

We knew, but the proof is nice.

Scheduled Pinned Locked Moved Uncategorized
math
33 Posts 21 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • joriki@infosec.exchangeJ joriki@infosec.exchange

    @davidaugust

    not new, here's the 2024 paper referenced:

    Link Preview Image
    GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

    Abstract page for arXiv paper 2410.05229: GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

    favicon

    arXiv.org (arxiv.org)

    davidaugust@mastodon.onlineD This user is from outside of this forum
    davidaugust@mastodon.onlineD This user is from outside of this forum
    davidaugust@mastodon.online
    wrote last edited by
    #16

    @joriki it’s from August.

    joriki@infosec.exchangeJ 1 Reply Last reply
    0
    • davidaugust@mastodon.onlineD davidaugust@mastodon.online

      We knew, but the proof is nice.

      "Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"

      The guess-the-next-words machines don’t actually understand anything.

      Verifying your browser | Nitter

      favicon

      (nitter.poast.org)

      #math #ai

      audioflyer79@mstdn.socialA This user is from outside of this forum
      audioflyer79@mstdn.socialA This user is from outside of this forum
      audioflyer79@mstdn.social
      wrote last edited by
      #17

      @davidaugust Ecosia AI gets it right. It looks like the paper referenced was published in 2025, so the research conducted prior. The models are all much better now. I’m no AI apologist, but I think any argument of “AI sucks because it’s not good at _____” is on tenuous ground and will be proven wrong as the models continue to improve. @Ecosia

      Link Preview Image
      alisynthesis@io.waxandleather.comA 1 Reply Last reply
      0
      • audioflyer79@mstdn.socialA audioflyer79@mstdn.social

        @davidaugust Ecosia AI gets it right. It looks like the paper referenced was published in 2025, so the research conducted prior. The models are all much better now. I’m no AI apologist, but I think any argument of “AI sucks because it’s not good at _____” is on tenuous ground and will be proven wrong as the models continue to improve. @Ecosia

        Link Preview Image
        alisynthesis@io.waxandleather.comA This user is from outside of this forum
        alisynthesis@io.waxandleather.comA This user is from outside of this forum
        alisynthesis@io.waxandleather.com
        wrote last edited by
        #18

        @audioflyer79 @davidaugust I mean, it's worth noting that the LLMs have ingested that paper by now. : /

        audioflyer79@mstdn.socialA 1 Reply Last reply
        0
        • alisynthesis@io.waxandleather.comA alisynthesis@io.waxandleather.com

          @audioflyer79 @davidaugust I mean, it's worth noting that the LLMs have ingested that paper by now. : /

          audioflyer79@mstdn.socialA This user is from outside of this forum
          audioflyer79@mstdn.socialA This user is from outside of this forum
          audioflyer79@mstdn.social
          wrote last edited by
          #19

          @alisynthesis @davidaugust fair enough. I changed up the problem completely and added some reasoning and it did pretty well. It appears to be generating code to solve the math. The only thing it missed is that very unripe bananas are green, not yellow.

          James picks 40 apples on Monday. Then he picks 35 lemons on Tuesday. On Wednesday, he picks half as many bananas as he did apples, but five of them were very unripe. How many yellow fruits does James have?

          Link Preview ImageLink Preview Image
          morten_skaaning@mastodon.gamedev.placeM 1 Reply Last reply
          0
          • davidaugust@mastodon.onlineD davidaugust@mastodon.online

            We knew, but the proof is nice.

            "Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"

            The guess-the-next-words machines don’t actually understand anything.

            Verifying your browser | Nitter

            favicon

            (nitter.poast.org)

            #math #ai

            pikesley@mastodon.me.ukP This user is from outside of this forum
            pikesley@mastodon.me.ukP This user is from outside of this forum
            pikesley@mastodon.me.uk
            wrote last edited by
            #20

            @davidaugust

            Amo Bishop Rodent (@pikesley@mastodon.me.uk)

            "We made the computers, the notoriously accurate calculating machines, worse at arithmetic. This is surely progress along the path to creating Computer God"

            favicon

            mastodon.me.uk (mastodon.me.uk)

            1 Reply Last reply
            0
            • lemgandi@mastodon.socialL lemgandi@mastodon.social

              @davidaugust

              In other shocking news:

              Water is Wet
              Without air you will die

              ozzelot@mstdn.socialO This user is from outside of this forum
              ozzelot@mstdn.socialO This user is from outside of this forum
              ozzelot@mstdn.social
              wrote last edited by
              #21

              @lemgandi
              The wetness of water has been hotly debated, as to some wet means "covered with or soaked in water", and it's questioned whether water is covered with itself.
              @davidaugust

              1 Reply Last reply
              0
              • karen5lund@mastodon.socialK karen5lund@mastodon.social

                @davidaugust In about 80 years we've gone from a room full of computers the size of refrigerators that were good at crunching numbers but not much else to computers the size of corporate office parks that can draw almost-convincing pictures of people with five fingers (and thumbs, too!) but can't do elementary school math.

                And some people call this progress.

                bouriquet@mastodon.socialB This user is from outside of this forum
                bouriquet@mastodon.socialB This user is from outside of this forum
                bouriquet@mastodon.social
                wrote last edited by
                #22

                @Karen5Lund Maybe because people stopped writing efficient code about 20 years ago?

                1 Reply Last reply
                0
                • davidaugust@mastodon.onlineD davidaugust@mastodon.online

                  We knew, but the proof is nice.

                  "Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"

                  The guess-the-next-words machines don’t actually understand anything.

                  Verifying your browser | Nitter

                  favicon

                  (nitter.poast.org)

                  #math #ai

                  pascal_le_merrer@mastodon.socialP This user is from outside of this forum
                  pascal_le_merrer@mastodon.socialP This user is from outside of this forum
                  pascal_le_merrer@mastodon.social
                  wrote last edited by
                  #23

                  @davidaugust AGI is coming son 🤭

                  davidaugust@mastodon.onlineD 1 Reply Last reply
                  0
                  • davidaugust@mastodon.onlineD davidaugust@mastodon.online

                    We knew, but the proof is nice.

                    "Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"

                    The guess-the-next-words machines don’t actually understand anything.

                    Verifying your browser | Nitter

                    favicon

                    (nitter.poast.org)

                    #math #ai

                    flq@freiburg.socialF This user is from outside of this forum
                    flq@freiburg.socialF This user is from outside of this forum
                    flq@freiburg.social
                    wrote last edited by
                    #24

                    @davidaugust interesting. Had to ask. Already fixed?

                    Link Preview Image
                    davidaugust@mastodon.onlineD 1 Reply Last reply
                    0
                    • davidaugust@mastodon.onlineD davidaugust@mastodon.online

                      @glitzersachen @scottjenson @xdydx guessing you are joking. But also suspect it may be an inside joke with not a lot of folks on the inside.

                      G This user is from outside of this forum
                      G This user is from outside of this forum
                      glitzersachen@hachyderm.io
                      wrote last edited by
                      #25

                      @davidaugust @scottjenson @xdydx

                      True. See @xdydx 's reply.

                      1 Reply Last reply
                      0
                      • davidaugust@mastodon.onlineD davidaugust@mastodon.online

                        We knew, but the proof is nice.

                        "Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"

                        The guess-the-next-words machines don’t actually understand anything.

                        Verifying your browser | Nitter

                        favicon

                        (nitter.poast.org)

                        #math #ai

                        elithebearded@fed.qaz.redE This user is from outside of this forum
                        elithebearded@fed.qaz.redE This user is from outside of this forum
                        elithebearded@fed.qaz.red
                        wrote last edited by
                        #26

                        @davidaugust

                        Shortcut to paper: https://arxiv.org/pdf/2410.05229

                        1 Reply Last reply
                        0
                        • audioflyer79@mstdn.socialA audioflyer79@mstdn.social

                          @alisynthesis @davidaugust fair enough. I changed up the problem completely and added some reasoning and it did pretty well. It appears to be generating code to solve the math. The only thing it missed is that very unripe bananas are green, not yellow.

                          James picks 40 apples on Monday. Then he picks 35 lemons on Tuesday. On Wednesday, he picks half as many bananas as he did apples, but five of them were very unripe. How many yellow fruits does James have?

                          Link Preview ImageLink Preview Image
                          morten_skaaning@mastodon.gamedev.placeM This user is from outside of this forum
                          morten_skaaning@mastodon.gamedev.placeM This user is from outside of this forum
                          morten_skaaning@mastodon.gamedev.place
                          wrote last edited by
                          #27

                          @audioflyer79 @alisynthesis @davidaugust how does it do if you swap the colors of the fruit?

                          1 Reply Last reply
                          0
                          • pascal_le_merrer@mastodon.socialP pascal_le_merrer@mastodon.social

                            @davidaugust AGI is coming son 🤭

                            davidaugust@mastodon.onlineD This user is from outside of this forum
                            davidaugust@mastodon.onlineD This user is from outside of this forum
                            davidaugust@mastodon.online
                            wrote last edited by
                            #28

                            @pascal_le_merrer any day now. I hear potus say in two weeks.

                            1 Reply Last reply
                            0
                            • flq@freiburg.socialF flq@freiburg.social

                              @davidaugust interesting. Had to ask. Already fixed?

                              Link Preview Image
                              davidaugust@mastodon.onlineD This user is from outside of this forum
                              davidaugust@mastodon.onlineD This user is from outside of this forum
                              davidaugust@mastodon.online
                              wrote last edited by
                              #29

                              @flq yes, many systems have tools and/or abilities built in to take over basic math operations that simpler LLMs failed at.

                              The salient and enduring issue, I think, is that the spin and marketing of LLMs as "understanding," "thinking" or "intelligent" (as those words typical meanings suggest) remains largely fictional.

                              1 Reply Last reply
                              0
                              • davidaugust@mastodon.onlineD davidaugust@mastodon.online

                                @joriki it’s from August.

                                joriki@infosec.exchangeJ This user is from outside of this forum
                                joriki@infosec.exchangeJ This user is from outside of this forum
                                joriki@infosec.exchange
                                wrote last edited by
                                #30

                                @davidaugust

                                October 2024

                                Link Preview Image
                                Apple Engineers Show How Flimsy AI ‘Reasoning’ Can Be

                                The new frontier in large language models is the ability to “reason” their way through problems. New research from Apple says it's not quite what it's cracked up to be.

                                favicon

                                WIRED (www.wired.com)

                                1 Reply Last reply
                                0
                                • bladecoder@androiddev.socialB This user is from outside of this forum
                                  bladecoder@androiddev.socialB This user is from outside of this forum
                                  bladecoder@androiddev.social
                                  wrote last edited by
                                  #31

                                  @drifthood @davidaugust This makes me think of "Clever Hans", the horse that appeared to do arithmetics but actually just responded to involuntary human cues:
                                  https://en.wikipedia.org/wiki/Clever_Hans

                                  1 Reply Last reply
                                  0
                                  • davidaugust@mastodon.onlineD davidaugust@mastodon.online

                                    We knew, but the proof is nice.

                                    "Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves"

                                    The guess-the-next-words machines don’t actually understand anything.

                                    Verifying your browser | Nitter

                                    favicon

                                    (nitter.poast.org)

                                    #math #ai

                                    erwinrossen@mas.toE This user is from outside of this forum
                                    erwinrossen@mas.toE This user is from outside of this forum
                                    erwinrossen@mas.to
                                    wrote last edited by
                                    #32

                                    @davidaugust Of course an LLM cannot do math, but to be honest, that is also not what they're designed for. An LLM these days like Claude knows that it should take a calculator and type the equation in there, instead of hallucinating an answer. Complaining that an LLM can't do math is like complaining a screwdriver can't drill a hole.

                                    You can counter that there are plenty of people who are using the screwdriver to drill the hole, but that is not on the tool, that is on the user.

                                    erwinrossen@mas.toE 1 Reply Last reply
                                    0
                                    • erwinrossen@mas.toE erwinrossen@mas.to

                                      @davidaugust Of course an LLM cannot do math, but to be honest, that is also not what they're designed for. An LLM these days like Claude knows that it should take a calculator and type the equation in there, instead of hallucinating an answer. Complaining that an LLM can't do math is like complaining a screwdriver can't drill a hole.

                                      You can counter that there are plenty of people who are using the screwdriver to drill the hole, but that is not on the tool, that is on the user.

                                      erwinrossen@mas.toE This user is from outside of this forum
                                      erwinrossen@mas.toE This user is from outside of this forum
                                      erwinrossen@mas.to
                                      wrote last edited by
                                      #33

                                      @davidaugust When did they do this test? I tried it with the following LLMs: Sonnet 4.6, Codex 5.3, GPT-5.4, GPT-5-Mini and Kimi-K2.5. They all answer the kiwi question correctly.

                                      1 Reply Last reply
                                      0
                                      Reply
                                      • Reply as topic
                                      Log in to reply
                                      • Oldest to Newest
                                      • Newest to Oldest
                                      • Most Votes


                                      • Login

                                      • Login or register to search.
                                      • First post
                                        Last post
                                      0
                                      • Categories
                                      • Recent
                                      • Tags
                                      • Popular
                                      • World
                                      • Users
                                      • Groups