Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges.

There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges.

Scheduled Pinned Locked Moved Uncategorized
63 Posts 33 Posters 74 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • lina@vt.socialL lina@vt.social

    And honestly, reading the Claude output, it's just ridiculous. It clearly has no idea what it's doing and it's just pattern-matching. Once it found the flag it spent 7 pages of reasoning and four more scripts trying to verify it, and failed to actually find what went wrong. It just concluded after all that time wasted that sometimes it gets the right answer and sometimes the wrong answer and so probably the flag that looks like a flag is the flag. It can't debug its own code to find out what actually went wrong, it just decided to brute force try again a different way.

    It's just a pattern-matching machine. But it turns out if you brute force pattern-match enough times in enough steps inside a reasoning loop, you eventually stumble upon the answer, even if you have no idea how.

    Humans can "wing it" and pattern-match too, but it's a gamble. If you pattern-match wrong and go down the wrong path, you just wasted a bunch of time and someone else wins. Competitive CTFs are all about walking the line between going as fast as possible and being very careful so you don't have to revisit, debug, and redo a bunch of your work. LLMs completely screw that up by brute forcing the process faster than humans.

    This sucks.

    curtmack@floss.socialC This user is from outside of this forum
    curtmack@floss.socialC This user is from outside of this forum
    curtmack@floss.social
    wrote last edited by
    #54

    @lina that's the worst part IMO. We get Claude through work and, all environmental and ethical issues aside, I just hate using it. Curating mounds of garbage output from the Screw It Up Faster Machine sucks. But it looks *great* in artificial evaluations with a concrete, machine-verifiable goal. And too many managers don't understand that real world programming isn't just passing a succession of concrete, machine-verifiable goals.

    1 Reply Last reply
    0
    • lina@vt.socialL lina@vt.social

      So it's not surprising that an LLM can solve them, because it automates the process. That just takes all the fun and all the learning out of it, completely defeating the purpose.

      I'm sure you could still come up with challenges that LLMs can't solve, but they would necessarily be harder, because LLMs are going to oneshot any of the "baby" starter challenges you could possibly come up with. So you either get rid of the "baby" challenges entirely (which means less experienced teams can't compete at all), or you accept that people will solve them with LLMs. But neither of those actually works.

      Since CTF competitions are pretty much by definition timed, speed is an advantage. That means a team that does not use LLMs will not win, so teams must use LLMs. This applies to both new and experienced teams. But: A newbie team using LLMs will not learn. Because the whole point is learning by doing, and you're not doing anything. And so will not become experienced.

      So this is going to devolve into CTFs being a battle of teams using LLMs to fight for the top spots, where everyone who doesn't want to use an LLM is excluded, and where less experienced teams stop improving and getting better, because they're outsourcing the work to LLMs and not learning as a result.

      jce@infosec.exchangeJ This user is from outside of this forum
      jce@infosec.exchangeJ This user is from outside of this forum
      jce@infosec.exchange
      wrote last edited by
      #55

      @lina Already in 2022 for the "European Cyber Cup" CTF at least one of the top3 team had ChatGPT open before even checking what some of the challenges were about 🫠

      1 Reply Last reply
      0
      • lina@vt.socialL lina@vt.social

        And honestly, reading the Claude output, it's just ridiculous. It clearly has no idea what it's doing and it's just pattern-matching. Once it found the flag it spent 7 pages of reasoning and four more scripts trying to verify it, and failed to actually find what went wrong. It just concluded after all that time wasted that sometimes it gets the right answer and sometimes the wrong answer and so probably the flag that looks like a flag is the flag. It can't debug its own code to find out what actually went wrong, it just decided to brute force try again a different way.

        It's just a pattern-matching machine. But it turns out if you brute force pattern-match enough times in enough steps inside a reasoning loop, you eventually stumble upon the answer, even if you have no idea how.

        Humans can "wing it" and pattern-match too, but it's a gamble. If you pattern-match wrong and go down the wrong path, you just wasted a bunch of time and someone else wins. Competitive CTFs are all about walking the line between going as fast as possible and being very careful so you don't have to revisit, debug, and redo a bunch of your work. LLMs completely screw that up by brute forcing the process faster than humans.

        This sucks.

        dngrs@chaos.socialD This user is from outside of this forum
        dngrs@chaos.socialD This user is from outside of this forum
        dngrs@chaos.social
        wrote last edited by
        #56

        @lina LLMs can't reason

        starsider@valenciapa.wsS 1 Reply Last reply
        0
        • dngrs@chaos.socialD dngrs@chaos.social

          @lina LLMs can't reason

          starsider@valenciapa.wsS This user is from outside of this forum
          starsider@valenciapa.wsS This user is from outside of this forum
          starsider@valenciapa.ws
          wrote last edited by
          #57

          @dngrs @lina We don't doubt that, but here's used with a different meaning, there's no word for this process that doesn't also have a definition of an uniquely human ability. And, for example, saying that a machine "thinks" is nothing new, I was saying that 20+ years ago whenever a computer was stuck doing something which would finish eventually. Particularly if it was a virtual game opponent (which we also called AI because the term has always been that broad).

          1 Reply Last reply
          0
          • sonic2k@oldbytes.spaceS sonic2k@oldbytes.space

            @lina

            AI is fast eradicating any learning activity.
            In my current job, learning anything new is actively discouraged.

            As was said to us "they only care about numbers on a dashboard".

            I got to the position I am in, at the level at I am in, by being curious and very interested, in taking things apart, and figuring out how they work.

            A LLM, which, in the eyes of a CEO means he can get rid of people like me, is the end of the road, we are all doomed.

            J This user is from outside of this forum
            J This user is from outside of this forum
            jmj@hachyderm.io
            wrote last edited by
            #58

            @Sonic2k @lina your looking at it the wrong way. Yes it’s killing one type of learning. But it’s teaching you how to CTF using AI, what are it strengths and weaknesses, what prompts are effective? What sub problems should the AI tackle, what should the human focus on. It’s no different than a carpenter switching from a hand plane to a powered belt sander. The skill set changes, the results are more or less the same. Someone that only learns to belt sand isn’t less of a carpenter. It gatekeeping to think otherwise. Yes the “elitist artists” will argue otherwise, but the difference is moot for the vast bulk of us working stiffs.

            laund@wetdry.worldL 1 Reply Last reply
            0
            • doragasu@mastodon.sdf.orgD doragasu@mastodon.sdf.org

              @lina I wonder if you can still design a challenge to be "LLM unfriendly" by changing the wording, just like those papers showing how an LLM aces problems like "river crossing", but if you change wording a bit, they just fail in weird and spectacular ways.

              bob_zim@infosec.exchangeB This user is from outside of this forum
              bob_zim@infosec.exchangeB This user is from outside of this forum
              bob_zim@infosec.exchange
              wrote last edited by
              #59

              @doragasu @lina Probably. LLMs are hilariously bad at dealing with linguistic ambiguities like puns.

              One of my favorite ambiguities I’ve seen was saying some people “lie about the family tree”. Are they being deceptive on the topic of relations, or are they reclining around a plant tended by multiple generations?

              1 Reply Last reply
              1
              0
              • J jmj@hachyderm.io

                @Sonic2k @lina your looking at it the wrong way. Yes it’s killing one type of learning. But it’s teaching you how to CTF using AI, what are it strengths and weaknesses, what prompts are effective? What sub problems should the AI tackle, what should the human focus on. It’s no different than a carpenter switching from a hand plane to a powered belt sander. The skill set changes, the results are more or less the same. Someone that only learns to belt sand isn’t less of a carpenter. It gatekeeping to think otherwise. Yes the “elitist artists” will argue otherwise, but the difference is moot for the vast bulk of us working stiffs.

                laund@wetdry.worldL This user is from outside of this forum
                laund@wetdry.worldL This user is from outside of this forum
                laund@wetdry.world
                wrote last edited by
                #60

                @Jmj @Sonic2k @lina classic ai apologist "expertise is unnecessary" fallacy. The results are perhaps similar on the surface "was the task completed" level but if person does it and learns the details an LLM can brute force past, that person can then recognize the issues showcased without going out of their way to look for them, which is a incredibly important part for security work. Because the real world is far messier and less clear than a CTF, and part of dealing with that is the kind of intuition and almost subconscious understanding which is impossible to achieve by using an LLM. And CTFs used to be decent at finding and rewarding those who are good at that.

                1 Reply Last reply
                0
                • sitcom_nemesis@tech.lgbtS This user is from outside of this forum
                  sitcom_nemesis@tech.lgbtS This user is from outside of this forum
                  sitcom_nemesis@tech.lgbt
                  wrote last edited by
                  #61

                  @Alib234 @lina AIs are better than humans will ever be at chess and this was the case 20 years ago.

                  We ban AIs in chess.

                  It's a pain to detect but it's incredibly important for the integrity of the game.

                  And it about communicating norms and values too, "we don't want AI" is an incredibly different set of values than "we want AI in only xyz ways"

                  1 Reply Last reply
                  0
                  • lina@vt.socialL lina@vt.social

                    @nightwolf Yeah, I'm thinking mostly Jeopardy, which is the style I'm most familiar with. It just sucks to see that competition format completely break. I used to write a lot of challenges for that.

                    nightwolf@defcon.socialN This user is from outside of this forum
                    nightwolf@defcon.socialN This user is from outside of this forum
                    nightwolf@defcon.social
                    wrote last edited by
                    #62

                    @lina Agreed. It will be interesting to see the next few years since Jeopardy format has been the most popular and easiest to implement.

                    1 Reply Last reply
                    0
                    • lina@vt.socialL lina@vt.social

                      And honestly, reading the Claude output, it's just ridiculous. It clearly has no idea what it's doing and it's just pattern-matching. Once it found the flag it spent 7 pages of reasoning and four more scripts trying to verify it, and failed to actually find what went wrong. It just concluded after all that time wasted that sometimes it gets the right answer and sometimes the wrong answer and so probably the flag that looks like a flag is the flag. It can't debug its own code to find out what actually went wrong, it just decided to brute force try again a different way.

                      It's just a pattern-matching machine. But it turns out if you brute force pattern-match enough times in enough steps inside a reasoning loop, you eventually stumble upon the answer, even if you have no idea how.

                      Humans can "wing it" and pattern-match too, but it's a gamble. If you pattern-match wrong and go down the wrong path, you just wasted a bunch of time and someone else wins. Competitive CTFs are all about walking the line between going as fast as possible and being very careful so you don't have to revisit, debug, and redo a bunch of your work. LLMs completely screw that up by brute forcing the process faster than humans.

                      This sucks.

                      alice@lgbtqia.spaceA This user is from outside of this forum
                      alice@lgbtqia.spaceA This user is from outside of this forum
                      alice@lgbtqia.space
                      wrote last edited by
                      #63

                      @lina maybe it would work to take a page from my areas of expertise, locks and psychology. Make trap flags that lead AIs into false solutions that humans can identify and step out of, but that AI thinks is the right way forward.

                      Update: I tried about a dozen decoy flags, and ChatGPT was surprisingly good at picking out the correct one. The only ones where it failed were when the flag decoded into what looked like a valid flag, but it was an instruction to enter something else.

                      Like:

                      - CTF_3NT3RTH1SFL4GBKWDS
                      - CTF_F0110WD1R3CT10NS

                      1 Reply Last reply
                      1
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups