Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges.

There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges.

Scheduled Pinned Locked Moved Uncategorized
63 Posts 33 Posters 74 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • lina@vt.socialL lina@vt.social

    I might still do a monthly challenge or something in the future so people who want to have fun and learn can have fun and learn. That's still okay.

    But CTFs as discrete competitions with winners are dead.

    A CTF competition is basically gameified homework.

    LLMs broke the game. Now all that's left is self study.

    doragasu@mastodon.sdf.orgD This user is from outside of this forum
    doragasu@mastodon.sdf.orgD This user is from outside of this forum
    doragasu@mastodon.sdf.org
    wrote last edited by
    #39

    @lina I wonder if you can still design a challenge to be "LLM unfriendly" by changing the wording, just like those papers showing how an LLM aces problems like "river crossing", but if you change wording a bit, they just fail in weird and spectacular ways.

    lina@vt.socialL bob_zim@infosec.exchangeB 2 Replies Last reply
    0
    • doragasu@mastodon.sdf.orgD doragasu@mastodon.sdf.org

      @lina I wonder if you can still design a challenge to be "LLM unfriendly" by changing the wording, just like those papers showing how an LLM aces problems like "river crossing", but if you change wording a bit, they just fail in weird and spectacular ways.

      lina@vt.socialL This user is from outside of this forum
      lina@vt.socialL This user is from outside of this forum
      lina@vt.social
      wrote last edited by
      #40

      @doragasu Possibly? I might try removing all "hints" from one and trying again and seeing if it's any different. But that also affects human solvers... the hints are there to point you towards a website that explains the fundamentals of what's going on. The LLM didn't even read that, it just guessed from a filename and a comment and hulk smashed its way to guessing the general concept right with multiple attempts...

      doragasu@mastodon.sdf.orgD 1 Reply Last reply
      0
      • lina@vt.socialL lina@vt.social

        @doragasu Possibly? I might try removing all "hints" from one and trying again and seeing if it's any different. But that also affects human solvers... the hints are there to point you towards a website that explains the fundamentals of what's going on. The LLM didn't even read that, it just guessed from a filename and a comment and hulk smashed its way to guessing the general concept right with multiple attempts...

        doragasu@mastodon.sdf.orgD This user is from outside of this forum
        doragasu@mastodon.sdf.orgD This user is from outside of this forum
        doragasu@mastodon.sdf.org
        wrote last edited by
        #41

        @lina In those papers trying to confuse LLMs, what was very effective IIRC, was adding data you don't need to use to the statement. The LLM tried to use all data you gave it to solve the problem and fail. Just like when a child is solving maths problems from a text book, all problems look similar so the child internalizes that you have to add two numbers and divide by the third one. Then you change the problem and the child fails because applies the same "formula".

        doragasu@mastodon.sdf.orgD 1 Reply Last reply
        0
        • doragasu@mastodon.sdf.orgD doragasu@mastodon.sdf.org

          @lina In those papers trying to confuse LLMs, what was very effective IIRC, was adding data you don't need to use to the statement. The LLM tried to use all data you gave it to solve the problem and fail. Just like when a child is solving maths problems from a text book, all problems look similar so the child internalizes that you have to add two numbers and divide by the third one. Then you change the problem and the child fails because applies the same "formula".

          doragasu@mastodon.sdf.orgD This user is from outside of this forum
          doragasu@mastodon.sdf.orgD This user is from outside of this forum
          doragasu@mastodon.sdf.org
          wrote last edited by
          #42

          @lina Like in here: https://arxiv.org/abs/2305.04388

          doragasu@mastodon.sdf.orgD 1 Reply Last reply
          0
          • doragasu@mastodon.sdf.orgD doragasu@mastodon.sdf.org

            @lina Like in here: https://arxiv.org/abs/2305.04388

            doragasu@mastodon.sdf.orgD This user is from outside of this forum
            doragasu@mastodon.sdf.orgD This user is from outside of this forum
            doragasu@mastodon.sdf.org
            wrote last edited by
            #43

            @lina Or better this one: https://arxiv.org/abs/2410.05229

            Link Preview Image
            1 Reply Last reply
            0
            • lina@vt.socialL This user is from outside of this forum
              lina@vt.socialL This user is from outside of this forum
              lina@vt.social
              wrote last edited by
              #44

              @grishka FYI your instance seems to have a very old display name cached for me (that it is using for mentions) ;;

              grishka@friends.grishka.meG 1 Reply Last reply
              0
              • nathan@mastodon.e4b4.euN nathan@mastodon.e4b4.eu

                @lina Ah I didn't consider that there would be a culture of hiding tools/methods. Yeah that's definitely incompatible with a post-LLM world.

                This is a general trend with GenAI: the only way to earn legitimacy is either in person, or by publicizing the creative process. For a while already visual/music artists have had to either rely on their existing credibility, or share their creative process to establish their art's legitimacy. New anonymous art has sadly been made nearly worthless.

                lina@vt.socialL This user is from outside of this forum
                lina@vt.socialL This user is from outside of this forum
                lina@vt.social
                wrote last edited by
                #45

                @nathan I don't think there's necessarily a culture of hiding methods outright (though some of the more competitive teams might), but more like people build their own personal stash of scripts and things to build off of, and don't necessarily just outright post it on GitHub or whatever.

                So like, "fucky stuff with QR codes" having showed up in CTF challenges more than once, I have a personal "do low level analysis and extended recovery of damaged QR codes" script.

                1 Reply Last reply
                0
                • abacabadabacaba@infosec.exchangeA abacabadabacaba@infosec.exchange

                  @lina There are programming competitions where participants run their solutions locally and submit the output. But they are usually also required to submit the code, even though it is not automatically judged. If cheating is suspected, the judges may look into the code. Also there may be automated checks for plagiarism etc. CTFs could do the same. There really isn't a good reason to keep solutions secret after the challenge concludes, and published solutions can serve as a learning material for future challenges.

                  lina@vt.socialL This user is from outside of this forum
                  lina@vt.socialL This user is from outside of this forum
                  lina@vt.social
                  wrote last edited by
                  #46

                  @abacabadabacaba The thing is the solution isn't "the code". The solution is the process. You can have an LLM "solve" it for you, then rewrite the process and cheat that way. Yes the solution will often involve some bespoke scripts and tooling, but that's just part of it. The "aha moments", that you can't provide proof of.

                  1 Reply Last reply
                  0
                  • lina@vt.socialL lina@vt.social

                    @grishka FYI your instance seems to have a very old display name cached for me (that it is using for mentions) ;;

                    grishka@friends.grishka.meG This user is from outside of this forum
                    grishka@friends.grishka.meG This user is from outside of this forum
                    grishka@friends.grishka.me
                    wrote last edited by
                    #47

                    Hoshino Lina (星乃リナ) 🩵 3D Yuri Wedding 2026!!!, yeah I only automatically reload actors when I receive activities from them and more than 24 hours has passed since the previous reload. Now that you've sent me a reply, it did trigger that. Maybe I should do the same when fetching things like a post that someone boosted.

                    lina@vt.socialL 1 Reply Last reply
                    0
                    • grishka@friends.grishka.meG grishka@friends.grishka.me

                      Hoshino Lina (星乃リナ) 🩵 3D Yuri Wedding 2026!!!, yeah I only automatically reload actors when I receive activities from them and more than 24 hours has passed since the previous reload. Now that you've sent me a reply, it did trigger that. Maybe I should do the same when fetching things like a post that someone boosted.

                      lina@vt.socialL This user is from outside of this forum
                      lina@vt.socialL This user is from outside of this forum
                      lina@vt.social
                      wrote last edited by
                      #48

                      @grishka Yeah I think that name was possibly a year+ old ^^;;

                      1 Reply Last reply
                      0
                      • lina@vt.socialL lina@vt.social

                        And honestly, reading the Claude output, it's just ridiculous. It clearly has no idea what it's doing and it's just pattern-matching. Once it found the flag it spent 7 pages of reasoning and four more scripts trying to verify it, and failed to actually find what went wrong. It just concluded after all that time wasted that sometimes it gets the right answer and sometimes the wrong answer and so probably the flag that looks like a flag is the flag. It can't debug its own code to find out what actually went wrong, it just decided to brute force try again a different way.

                        It's just a pattern-matching machine. But it turns out if you brute force pattern-match enough times in enough steps inside a reasoning loop, you eventually stumble upon the answer, even if you have no idea how.

                        Humans can "wing it" and pattern-match too, but it's a gamble. If you pattern-match wrong and go down the wrong path, you just wasted a bunch of time and someone else wins. Competitive CTFs are all about walking the line between going as fast as possible and being very careful so you don't have to revisit, debug, and redo a bunch of your work. LLMs completely screw that up by brute forcing the process faster than humans.

                        This sucks.

                        sonic2k@oldbytes.spaceS This user is from outside of this forum
                        sonic2k@oldbytes.spaceS This user is from outside of this forum
                        sonic2k@oldbytes.space
                        wrote last edited by
                        #49

                        @lina

                        AI is fast eradicating any learning activity.
                        In my current job, learning anything new is actively discouraged.

                        As was said to us "they only care about numbers on a dashboard".

                        I got to the position I am in, at the level at I am in, by being curious and very interested, in taking things apart, and figuring out how they work.

                        A LLM, which, in the eyes of a CEO means he can get rid of people like me, is the end of the road, we are all doomed.

                        J 1 Reply Last reply
                        0
                        • natty@astolfo.socialN natty@astolfo.social

                          @lina@vt.social To be fair I'd argue this is strictly a people problem

                          I feel like this is the inherent nature of competition in places where cooperation would make much more sense

                          And this issue permeates so many areas that the world is more preoccupied with catching the people cheating the system instead of going "hey maybe this system could incentivize actually getting invested into the thing instead of being a pure so-called meritocracy "

                          lina@vt.socialL This user is from outside of this forum
                          lina@vt.socialL This user is from outside of this forum
                          lina@vt.social
                          wrote last edited by
                          #50

                          @natty But the whole point of a for-fun(/prize) competition is to use the gamification to motivate people... that's kind of what games are?

                          You don't strictly need it, you can publish challenges to be solved for no points and no prize... but that demonstrably does not get as many people interested. Between people for whom that works, and the "I just want to win" people who would use LLMs, there are people who would be motivated to compete but not just self-study, and you lose those when the LLM cheaters come in.

                          1 Reply Last reply
                          0
                          • ahasty@techhub.socialA ahasty@techhub.social

                            @lina I do feel like this is about how you use the LLM. I often find my self throwing something into my local llama to give me an ELI5 or what do these flags on this command do in combination.

                            But as someone who has Designed CTFs and watched someone fling through it without learning a damn thing, it can be hard to keep the faith.

                            When I took physics all those years ago my professor made us learn a slide rule before a calculator. If you skip over the basics and use a machine to do it..when the machine breaks or is wrong, who is gonna fix it and how?

                            lina@vt.socialL This user is from outside of this forum
                            lina@vt.socialL This user is from outside of this forum
                            lina@vt.social
                            wrote last edited by
                            #51

                            @ahasty But at least a calculator is always right. I have no problem with people using tools that can be understood and are reliable/engineered.

                            The problem is LLMs are not that. They cannot be understood, they are black boxes that just brute force their way through things. So they are particularly and uniquely toxic in the harm they cause, compared to the tools we've had until now as part of the general industrial/technology revolution.

                            ahasty@techhub.socialA 1 Reply Last reply
                            0
                            • shansterable@ohai.socialS shansterable@ohai.social

                              @lina
                              CTF = Capture the Flag, in case that helps anyone besides me

                              I try to do for initialisms and acronyms what alt text does for images.

                              Wikipedia: In computer security, Capture the Flag (CTF) is an exercise in which participants attempt to find text strings, called "flags", which are secretly hidden in purposefully vulnerable programs or websites

                              arclight@oldbytes.spaceA This user is from outside of this forum
                              arclight@oldbytes.spaceA This user is from outside of this forum
                              arclight@oldbytes.space
                              wrote last edited by
                              #52

                              @shansterable @lina I had to look it up. The next most popular definition of CTF was Children's Tumor Foundation...

                              1 Reply Last reply
                              0
                              • lina@vt.socialL lina@vt.social

                                @ahasty But at least a calculator is always right. I have no problem with people using tools that can be understood and are reliable/engineered.

                                The problem is LLMs are not that. They cannot be understood, they are black boxes that just brute force their way through things. So they are particularly and uniquely toxic in the harm they cause, compared to the tools we've had until now as part of the general industrial/technology revolution.

                                ahasty@techhub.socialA This user is from outside of this forum
                                ahasty@techhub.socialA This user is from outside of this forum
                                ahasty@techhub.social
                                wrote last edited by
                                #53

                                @lina yes, they are a black box. If used as a way to help educate yourself there is value. When used as means to an end, you kill the pipeline of problem solving. Unfortunately the unwavering force of capitalism is almost always short sighted

                                1 Reply Last reply
                                0
                                • lina@vt.socialL lina@vt.social

                                  And honestly, reading the Claude output, it's just ridiculous. It clearly has no idea what it's doing and it's just pattern-matching. Once it found the flag it spent 7 pages of reasoning and four more scripts trying to verify it, and failed to actually find what went wrong. It just concluded after all that time wasted that sometimes it gets the right answer and sometimes the wrong answer and so probably the flag that looks like a flag is the flag. It can't debug its own code to find out what actually went wrong, it just decided to brute force try again a different way.

                                  It's just a pattern-matching machine. But it turns out if you brute force pattern-match enough times in enough steps inside a reasoning loop, you eventually stumble upon the answer, even if you have no idea how.

                                  Humans can "wing it" and pattern-match too, but it's a gamble. If you pattern-match wrong and go down the wrong path, you just wasted a bunch of time and someone else wins. Competitive CTFs are all about walking the line between going as fast as possible and being very careful so you don't have to revisit, debug, and redo a bunch of your work. LLMs completely screw that up by brute forcing the process faster than humans.

                                  This sucks.

                                  curtmack@floss.socialC This user is from outside of this forum
                                  curtmack@floss.socialC This user is from outside of this forum
                                  curtmack@floss.social
                                  wrote last edited by
                                  #54

                                  @lina that's the worst part IMO. We get Claude through work and, all environmental and ethical issues aside, I just hate using it. Curating mounds of garbage output from the Screw It Up Faster Machine sucks. But it looks *great* in artificial evaluations with a concrete, machine-verifiable goal. And too many managers don't understand that real world programming isn't just passing a succession of concrete, machine-verifiable goals.

                                  1 Reply Last reply
                                  0
                                  • lina@vt.socialL lina@vt.social

                                    So it's not surprising that an LLM can solve them, because it automates the process. That just takes all the fun and all the learning out of it, completely defeating the purpose.

                                    I'm sure you could still come up with challenges that LLMs can't solve, but they would necessarily be harder, because LLMs are going to oneshot any of the "baby" starter challenges you could possibly come up with. So you either get rid of the "baby" challenges entirely (which means less experienced teams can't compete at all), or you accept that people will solve them with LLMs. But neither of those actually works.

                                    Since CTF competitions are pretty much by definition timed, speed is an advantage. That means a team that does not use LLMs will not win, so teams must use LLMs. This applies to both new and experienced teams. But: A newbie team using LLMs will not learn. Because the whole point is learning by doing, and you're not doing anything. And so will not become experienced.

                                    So this is going to devolve into CTFs being a battle of teams using LLMs to fight for the top spots, where everyone who doesn't want to use an LLM is excluded, and where less experienced teams stop improving and getting better, because they're outsourcing the work to LLMs and not learning as a result.

                                    jce@infosec.exchangeJ This user is from outside of this forum
                                    jce@infosec.exchangeJ This user is from outside of this forum
                                    jce@infosec.exchange
                                    wrote last edited by
                                    #55

                                    @lina Already in 2022 for the "European Cyber Cup" CTF at least one of the top3 team had ChatGPT open before even checking what some of the challenges were about 🫠

                                    1 Reply Last reply
                                    0
                                    • lina@vt.socialL lina@vt.social

                                      And honestly, reading the Claude output, it's just ridiculous. It clearly has no idea what it's doing and it's just pattern-matching. Once it found the flag it spent 7 pages of reasoning and four more scripts trying to verify it, and failed to actually find what went wrong. It just concluded after all that time wasted that sometimes it gets the right answer and sometimes the wrong answer and so probably the flag that looks like a flag is the flag. It can't debug its own code to find out what actually went wrong, it just decided to brute force try again a different way.

                                      It's just a pattern-matching machine. But it turns out if you brute force pattern-match enough times in enough steps inside a reasoning loop, you eventually stumble upon the answer, even if you have no idea how.

                                      Humans can "wing it" and pattern-match too, but it's a gamble. If you pattern-match wrong and go down the wrong path, you just wasted a bunch of time and someone else wins. Competitive CTFs are all about walking the line between going as fast as possible and being very careful so you don't have to revisit, debug, and redo a bunch of your work. LLMs completely screw that up by brute forcing the process faster than humans.

                                      This sucks.

                                      dngrs@chaos.socialD This user is from outside of this forum
                                      dngrs@chaos.socialD This user is from outside of this forum
                                      dngrs@chaos.social
                                      wrote last edited by
                                      #56

                                      @lina LLMs can't reason

                                      starsider@valenciapa.wsS 1 Reply Last reply
                                      0
                                      • dngrs@chaos.socialD dngrs@chaos.social

                                        @lina LLMs can't reason

                                        starsider@valenciapa.wsS This user is from outside of this forum
                                        starsider@valenciapa.wsS This user is from outside of this forum
                                        starsider@valenciapa.ws
                                        wrote last edited by
                                        #57

                                        @dngrs @lina We don't doubt that, but here's used with a different meaning, there's no word for this process that doesn't also have a definition of an uniquely human ability. And, for example, saying that a machine "thinks" is nothing new, I was saying that 20+ years ago whenever a computer was stuck doing something which would finish eventually. Particularly if it was a virtual game opponent (which we also called AI because the term has always been that broad).

                                        1 Reply Last reply
                                        0
                                        • sonic2k@oldbytes.spaceS sonic2k@oldbytes.space

                                          @lina

                                          AI is fast eradicating any learning activity.
                                          In my current job, learning anything new is actively discouraged.

                                          As was said to us "they only care about numbers on a dashboard".

                                          I got to the position I am in, at the level at I am in, by being curious and very interested, in taking things apart, and figuring out how they work.

                                          A LLM, which, in the eyes of a CEO means he can get rid of people like me, is the end of the road, we are all doomed.

                                          J This user is from outside of this forum
                                          J This user is from outside of this forum
                                          jmj@hachyderm.io
                                          wrote last edited by
                                          #58

                                          @Sonic2k @lina your looking at it the wrong way. Yes it’s killing one type of learning. But it’s teaching you how to CTF using AI, what are it strengths and weaknesses, what prompts are effective? What sub problems should the AI tackle, what should the human focus on. It’s no different than a carpenter switching from a hand plane to a powered belt sander. The skill set changes, the results are more or less the same. Someone that only learns to belt sand isn’t less of a carpenter. It gatekeeping to think otherwise. Yes the “elitist artists” will argue otherwise, but the difference is moot for the vast bulk of us working stiffs.

                                          laund@wetdry.worldL 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups