Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges.

There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges.

Scheduled Pinned Locked Moved Uncategorized
63 Posts 33 Posters 74 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • lina@vt.socialL lina@vt.social

    There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges. I used to write CTF challenges in a past life, so I threw a couple of my hardest ones at it.

    We're screwed.

    At least with text-file style challenges ("source code provided" etc), Claude Opus solves them quickly. For the "simpler" of the two, it just very quickly ran through the steps to solve it. For the more "ridiculous" challenge, it took a long while, and in fact as I type this it's still burning tokens "verifying" the flag even though it very obviously found the flag and it knows it (it's leetspeak and it identified that and that it's plausible). LLMs are, indeed, still completely unintelligent, because no human would waste time verifying a flag and second-guessing itself when it very obviously is correct. (Also you could just run it...)

    But that doesn't matter, because it found it.

    The thing is, CTF challenges aren't about inventing the next great invention or having a rare spark of genius. CTF challenges are about learning things by doing. You're supposed to enjoy the process. The whole point of a well-designed CTF challenge is that anyone, given enough time and effort and self-improvement and learning, can solve it. The goal isn't actually to get the flag, otherwise you'd just ask another team for the flag (which is against the rules of course). The goal is to get the flag by yourself. If you ask an LLM to get the flag for you, you aren't doing that.

    (Continued)

    echedellelr@soc.masfloss.netE This user is from outside of this forum
    echedellelr@soc.masfloss.netE This user is from outside of this forum
    echedellelr@soc.masfloss.net
    wrote last edited by
    #18

    @lina most of the CTF include 6-7 challenges to be solved in 4 hours.

    Those CTFs expect you to know a typical set of forensync tools managed by an external guy/gal/entity which is somewhat known to be able to do it in time.

    It stops being funny when you stop learning by doing and starts being a "kill'em all" competition.

    lina@vt.socialL 1 Reply Last reply
    0
    • lina@vt.socialL lina@vt.social

      This is, quite frankly, the same problem LLM agents are causing in software engineering and such, just way worse. Because with CTFs, there is no "quality metric". Once you get the flag you get the flag. It doesn't matter if your approach was ridiculous or you completely misunderstood the problem or "winged it" in the worst way possible or the solver is a spaghetti ball of technical debt. It doesn't matter if Claude made a dozen reasoning errors in its chain that no human would (which it did). Every time it gets it wrong it just tries again, and it can try again orders of magnitude faster than a human, so it doesn't matter.

      I don't have a solution for this. You can't ban LLMs, people will use them regardless. You could try interviewing teams one on one after the challenge to see if they actually have a coherent story and clearly did the work, but even then you could conceivably cheat using an LLM and then wait it out a bit to make the time spent plausible, study the reasoning chain, and convince someone that you did the work. It's like LLMs in academics, but much worse due to the time constraints and explicitly competitive nature of CTFs.

      LLMs broke CTFs.

      abacabadabacaba@infosec.exchangeA This user is from outside of this forum
      abacabadabacaba@infosec.exchangeA This user is from outside of this forum
      abacabadabacaba@infosec.exchange
      wrote last edited by
      #19

      @lina Programming competitions are banning LLMs, see e.g. https://info.atcoder.jp/entry/llm-rules-en. How are CTFs any different?

      lina@vt.socialL neatchee@urusai.socialN 2 Replies Last reply
      0
      • echedellelr@soc.masfloss.netE echedellelr@soc.masfloss.net

        @lina most of the CTF include 6-7 challenges to be solved in 4 hours.

        Those CTFs expect you to know a typical set of forensync tools managed by an external guy/gal/entity which is somewhat known to be able to do it in time.

        It stops being funny when you stop learning by doing and starts being a "kill'em all" competition.

        lina@vt.socialL This user is from outside of this forum
        lina@vt.socialL This user is from outside of this forum
        lina@vt.social
        wrote last edited by
        #20

        @echedellelr The ones I've worked on are less about "forensic tooling" and more about diverse (reverse)engineering challenges. They also usually run for a couple days and ~16 chals.

        It evens out the playing field because pre-prepared tooling doesn't help you as much, since the challenges tend to be quite novel. I much prefer those to "write a ROP chain and exploit this service" or "crack this password" (not requiring an inordinate amount of compute, no more than 1hr of CPU time on a contemporary PC, is also a hard level design rule). There's usually one or two more typical infosec ones but they aren't the majority.

        One example is a CrackMe challenge that was written in Verilog (implementing a custom CPU to run the actual crackme binary).

        echedellelr@soc.masfloss.netE 1 Reply Last reply
        0
        • lina@vt.socialL lina@vt.social

          I might still do a monthly challenge or something in the future so people who want to have fun and learn can have fun and learn. That's still okay.

          But CTFs as discrete competitions with winners are dead.

          A CTF competition is basically gameified homework.

          LLMs broke the game. Now all that's left is self study.

          coldclimate@hachyderm.ioC This user is from outside of this forum
          coldclimate@hachyderm.ioC This user is from outside of this forum
          coldclimate@hachyderm.io
          wrote last edited by
          #21

          @lina thank you for this excellent thread

          1 Reply Last reply
          0
          • lina@vt.socialL lina@vt.social

            @echedellelr The ones I've worked on are less about "forensic tooling" and more about diverse (reverse)engineering challenges. They also usually run for a couple days and ~16 chals.

            It evens out the playing field because pre-prepared tooling doesn't help you as much, since the challenges tend to be quite novel. I much prefer those to "write a ROP chain and exploit this service" or "crack this password" (not requiring an inordinate amount of compute, no more than 1hr of CPU time on a contemporary PC, is also a hard level design rule). There's usually one or two more typical infosec ones but they aren't the majority.

            One example is a CrackMe challenge that was written in Verilog (implementing a custom CPU to run the actual crackme binary).

            echedellelr@soc.masfloss.netE This user is from outside of this forum
            echedellelr@soc.masfloss.netE This user is from outside of this forum
            echedellelr@soc.masfloss.net
            wrote last edited by
            #22

            @lina at least the ones performed by the National Police or any other national agency here is like that, and are the typical ones I see.

            Prolly is a cultural thing by country

            lina@vt.socialL 1 Reply Last reply
            0
            • echedellelr@soc.masfloss.netE echedellelr@soc.masfloss.net

              @lina at least the ones performed by the National Police or any other national agency here is like that, and are the typical ones I see.

              Prolly is a cultural thing by country

              lina@vt.socialL This user is from outside of this forum
              lina@vt.socialL This user is from outside of this forum
              lina@vt.social
              wrote last edited by
              #23

              @echedellelr CTFs run by organizations focusing on infosec and offensive capability would necessarily lean that way. That's not the world I'm interested in. There are many CTFs not associated with such organizations with different themes.

              echedellelr@soc.masfloss.netE 1 Reply Last reply
              0
              • lina@vt.socialL lina@vt.social

                There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges. I used to write CTF challenges in a past life, so I threw a couple of my hardest ones at it.

                We're screwed.

                At least with text-file style challenges ("source code provided" etc), Claude Opus solves them quickly. For the "simpler" of the two, it just very quickly ran through the steps to solve it. For the more "ridiculous" challenge, it took a long while, and in fact as I type this it's still burning tokens "verifying" the flag even though it very obviously found the flag and it knows it (it's leetspeak and it identified that and that it's plausible). LLMs are, indeed, still completely unintelligent, because no human would waste time verifying a flag and second-guessing itself when it very obviously is correct. (Also you could just run it...)

                But that doesn't matter, because it found it.

                The thing is, CTF challenges aren't about inventing the next great invention or having a rare spark of genius. CTF challenges are about learning things by doing. You're supposed to enjoy the process. The whole point of a well-designed CTF challenge is that anyone, given enough time and effort and self-improvement and learning, can solve it. The goal isn't actually to get the flag, otherwise you'd just ask another team for the flag (which is against the rules of course). The goal is to get the flag by yourself. If you ask an LLM to get the flag for you, you aren't doing that.

                (Continued)

                nightwolf@defcon.socialN This user is from outside of this forum
                nightwolf@defcon.socialN This user is from outside of this forum
                nightwolf@defcon.social
                wrote last edited by
                #24

                @lina I view CTFs mostly as a way to learn and think that still exists. If you LLM the whole thing, you just hamper your own ability to learn. Competition wise for jeopardy it's got some challenges. I think it may be interesting to see if there is a shift to more Attack and Defense, King of the Hill or different structures where LLMs would still help but one shot single solutions aren't necessarily the best possible approach.

                lina@vt.socialL 1 Reply Last reply
                0
                • abacabadabacaba@infosec.exchangeA abacabadabacaba@infosec.exchange

                  @lina Programming competitions are banning LLMs, see e.g. https://info.atcoder.jp/entry/llm-rules-en. How are CTFs any different?

                  lina@vt.socialL This user is from outside of this forum
                  lina@vt.socialL This user is from outside of this forum
                  lina@vt.social
                  wrote last edited by
                  #25

                  @abacabadabacaba It's much easier to parallel construct a CTF solution than a programming challenge. CTF challenges are all about having a series of realizations that lead to the answer.

                  If you ban LLMs in a programming challenge, you could conceivably detect signs of LLM usage in the program in various ways (not perfectly, but you could try). A CTF challenge just has one output, the flag. Everyone finds the same flag. There is no way to tell how you did it. You'd have to introduce invasive monitoring like online tests, and even if you record people's screens, they could easily be running an LLM on another machine to have it come up with the "key points" to the solution which you just implement. You can't prove that someone didn't have some ideas on their own.

                  abacabadabacaba@infosec.exchangeA 1 Reply Last reply
                  0
                  • lina@vt.socialL lina@vt.social

                    @echedellelr CTFs run by organizations focusing on infosec and offensive capability would necessarily lean that way. That's not the world I'm interested in. There are many CTFs not associated with such organizations with different themes.

                    echedellelr@soc.masfloss.netE This user is from outside of this forum
                    echedellelr@soc.masfloss.netE This user is from outside of this forum
                    echedellelr@soc.masfloss.net
                    wrote last edited by
                    #26

                    @lina sorry, I replied because you were generalising and was not my experience here.

                    1 Reply Last reply
                    0
                    • lina@vt.socialL lina@vt.social

                      This is, quite frankly, the same problem LLM agents are causing in software engineering and such, just way worse. Because with CTFs, there is no "quality metric". Once you get the flag you get the flag. It doesn't matter if your approach was ridiculous or you completely misunderstood the problem or "winged it" in the worst way possible or the solver is a spaghetti ball of technical debt. It doesn't matter if Claude made a dozen reasoning errors in its chain that no human would (which it did). Every time it gets it wrong it just tries again, and it can try again orders of magnitude faster than a human, so it doesn't matter.

                      I don't have a solution for this. You can't ban LLMs, people will use them regardless. You could try interviewing teams one on one after the challenge to see if they actually have a coherent story and clearly did the work, but even then you could conceivably cheat using an LLM and then wait it out a bit to make the time spent plausible, study the reasoning chain, and convince someone that you did the work. It's like LLMs in academics, but much worse due to the time constraints and explicitly competitive nature of CTFs.

                      LLMs broke CTFs.

                      yalter@mastodon.onlineY This user is from outside of this forum
                      yalter@mastodon.onlineY This user is from outside of this forum
                      yalter@mastodon.online
                      wrote last edited by
                      #27

                      @lina perhaps having separate categories for LLMs allowed vs. banned would help with 90% of this problem? So ppl who want to use LLM can do so at their pleasure, and only ppl who actively want to cheat (hopefully very few) will try to use LLM in the banned category.

                      lina@vt.socialL 1 Reply Last reply
                      0
                      • abacabadabacaba@infosec.exchangeA abacabadabacaba@infosec.exchange

                        @lina Programming competitions are banning LLMs, see e.g. https://info.atcoder.jp/entry/llm-rules-en. How are CTFs any different?

                        neatchee@urusai.socialN This user is from outside of this forum
                        neatchee@urusai.socialN This user is from outside of this forum
                        neatchee@urusai.social
                        wrote last edited by
                        #28

                        @abacabadabacaba @lina mostly because the incentive to cheat for time is so high and it places an ever increasing burden on the organizers to develop LLM detection methods that are prohibitively cumbersome.

                        Rules without the ability to enforce them effectively are just guideposts for bad actors

                        1 Reply Last reply
                        0
                        • nightwolf@defcon.socialN nightwolf@defcon.social

                          @lina I view CTFs mostly as a way to learn and think that still exists. If you LLM the whole thing, you just hamper your own ability to learn. Competition wise for jeopardy it's got some challenges. I think it may be interesting to see if there is a shift to more Attack and Defense, King of the Hill or different structures where LLMs would still help but one shot single solutions aren't necessarily the best possible approach.

                          lina@vt.socialL This user is from outside of this forum
                          lina@vt.socialL This user is from outside of this forum
                          lina@vt.social
                          wrote last edited by
                          #29

                          @nightwolf Yeah, I'm thinking mostly Jeopardy, which is the style I'm most familiar with. It just sucks to see that competition format completely break. I used to write a lot of challenges for that.

                          nightwolf@defcon.socialN 1 Reply Last reply
                          0
                          • yalter@mastodon.onlineY yalter@mastodon.online

                            @lina perhaps having separate categories for LLMs allowed vs. banned would help with 90% of this problem? So ppl who want to use LLM can do so at their pleasure, and only ppl who actively want to cheat (hopefully very few) will try to use LLM in the banned category.

                            lina@vt.socialL This user is from outside of this forum
                            lina@vt.socialL This user is from outside of this forum
                            lina@vt.social
                            wrote last edited by
                            #30

                            @YaLTeR I promise lots of people would cheat. These are competitions with rewards (bragging rights at minimum, but often cash prizes, swag, invitations to events, etc.)

                            1 Reply Last reply
                            0
                            • lina@vt.socialL lina@vt.social

                              There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges. I used to write CTF challenges in a past life, so I threw a couple of my hardest ones at it.

                              We're screwed.

                              At least with text-file style challenges ("source code provided" etc), Claude Opus solves them quickly. For the "simpler" of the two, it just very quickly ran through the steps to solve it. For the more "ridiculous" challenge, it took a long while, and in fact as I type this it's still burning tokens "verifying" the flag even though it very obviously found the flag and it knows it (it's leetspeak and it identified that and that it's plausible). LLMs are, indeed, still completely unintelligent, because no human would waste time verifying a flag and second-guessing itself when it very obviously is correct. (Also you could just run it...)

                              But that doesn't matter, because it found it.

                              The thing is, CTF challenges aren't about inventing the next great invention or having a rare spark of genius. CTF challenges are about learning things by doing. You're supposed to enjoy the process. The whole point of a well-designed CTF challenge is that anyone, given enough time and effort and self-improvement and learning, can solve it. The goal isn't actually to get the flag, otherwise you'd just ask another team for the flag (which is against the rules of course). The goal is to get the flag by yourself. If you ask an LLM to get the flag for you, you aren't doing that.

                              (Continued)

                              ahasty@techhub.socialA This user is from outside of this forum
                              ahasty@techhub.socialA This user is from outside of this forum
                              ahasty@techhub.social
                              wrote last edited by
                              #31

                              @lina I do feel like this is about how you use the LLM. I often find my self throwing something into my local llama to give me an ELI5 or what do these flags on this command do in combination.

                              But as someone who has Designed CTFs and watched someone fling through it without learning a damn thing, it can be hard to keep the faith.

                              When I took physics all those years ago my professor made us learn a slide rule before a calculator. If you skip over the basics and use a machine to do it..when the machine breaks or is wrong, who is gonna fix it and how?

                              lina@vt.socialL 1 Reply Last reply
                              0
                              • lina@vt.socialL lina@vt.social

                                This is, quite frankly, the same problem LLM agents are causing in software engineering and such, just way worse. Because with CTFs, there is no "quality metric". Once you get the flag you get the flag. It doesn't matter if your approach was ridiculous or you completely misunderstood the problem or "winged it" in the worst way possible or the solver is a spaghetti ball of technical debt. It doesn't matter if Claude made a dozen reasoning errors in its chain that no human would (which it did). Every time it gets it wrong it just tries again, and it can try again orders of magnitude faster than a human, so it doesn't matter.

                                I don't have a solution for this. You can't ban LLMs, people will use them regardless. You could try interviewing teams one on one after the challenge to see if they actually have a coherent story and clearly did the work, but even then you could conceivably cheat using an LLM and then wait it out a bit to make the time spent plausible, study the reasoning chain, and convince someone that you did the work. It's like LLMs in academics, but much worse due to the time constraints and explicitly competitive nature of CTFs.

                                LLMs broke CTFs.

                                grishka@friends.grishka.meG This user is from outside of this forum
                                grishka@friends.grishka.meG This user is from outside of this forum
                                grishka@friends.grishka.me
                                wrote last edited by
                                #32

                                Asahi Linya (朝日りにゃ〜), I really hope that LLMs are a temporary phenomenon. Sure the local ones will remain even after the bubble finally bursts, but they're ridiculously bad, you do need millions of dollars worth of GPUs to get to that "it's still bad but it looks plausible" level of output quality.

                                1 Reply Last reply
                                0
                                • lina@vt.socialL lina@vt.social

                                  There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges. I used to write CTF challenges in a past life, so I threw a couple of my hardest ones at it.

                                  We're screwed.

                                  At least with text-file style challenges ("source code provided" etc), Claude Opus solves them quickly. For the "simpler" of the two, it just very quickly ran through the steps to solve it. For the more "ridiculous" challenge, it took a long while, and in fact as I type this it's still burning tokens "verifying" the flag even though it very obviously found the flag and it knows it (it's leetspeak and it identified that and that it's plausible). LLMs are, indeed, still completely unintelligent, because no human would waste time verifying a flag and second-guessing itself when it very obviously is correct. (Also you could just run it...)

                                  But that doesn't matter, because it found it.

                                  The thing is, CTF challenges aren't about inventing the next great invention or having a rare spark of genius. CTF challenges are about learning things by doing. You're supposed to enjoy the process. The whole point of a well-designed CTF challenge is that anyone, given enough time and effort and self-improvement and learning, can solve it. The goal isn't actually to get the flag, otherwise you'd just ask another team for the flag (which is against the rules of course). The goal is to get the flag by yourself. If you ask an LLM to get the flag for you, you aren't doing that.

                                  (Continued)

                                  L This user is from outside of this forum
                                  L This user is from outside of this forum
                                  luupies@mastodon.social
                                  wrote last edited by
                                  #33

                                  @lina I'm a geek... I like AI and all of that... but if I understood your post right, it's "complaining" of the consequences of the capabilities it provides and that reminds me of MMORPGs a long time ago where you could marvel at the deeds of someone while now, it's just google the setup and technique and just reproduce it... basically, humans are becoming less the center of intelligence and more cows following a line

                                  1 Reply Last reply
                                  0
                                  • R relay@relay.an.exchange shared this topic
                                  • lina@vt.socialL lina@vt.social

                                    There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges. I used to write CTF challenges in a past life, so I threw a couple of my hardest ones at it.

                                    We're screwed.

                                    At least with text-file style challenges ("source code provided" etc), Claude Opus solves them quickly. For the "simpler" of the two, it just very quickly ran through the steps to solve it. For the more "ridiculous" challenge, it took a long while, and in fact as I type this it's still burning tokens "verifying" the flag even though it very obviously found the flag and it knows it (it's leetspeak and it identified that and that it's plausible). LLMs are, indeed, still completely unintelligent, because no human would waste time verifying a flag and second-guessing itself when it very obviously is correct. (Also you could just run it...)

                                    But that doesn't matter, because it found it.

                                    The thing is, CTF challenges aren't about inventing the next great invention or having a rare spark of genius. CTF challenges are about learning things by doing. You're supposed to enjoy the process. The whole point of a well-designed CTF challenge is that anyone, given enough time and effort and self-improvement and learning, can solve it. The goal isn't actually to get the flag, otherwise you'd just ask another team for the flag (which is against the rules of course). The goal is to get the flag by yourself. If you ask an LLM to get the flag for you, you aren't doing that.

                                    (Continued)

                                    matcha@anticapitalist.partyM This user is from outside of this forum
                                    matcha@anticapitalist.partyM This user is from outside of this forum
                                    matcha@anticapitalist.party
                                    wrote last edited by
                                    #34

                                    @lina they're engineering their self-incapacitation. Or decapacitation i suppose, because they get flush some skills down the drain to do that.

                                    1 Reply Last reply
                                    0
                                    • lina@vt.socialL lina@vt.social

                                      @abacabadabacaba It's much easier to parallel construct a CTF solution than a programming challenge. CTF challenges are all about having a series of realizations that lead to the answer.

                                      If you ban LLMs in a programming challenge, you could conceivably detect signs of LLM usage in the program in various ways (not perfectly, but you could try). A CTF challenge just has one output, the flag. Everyone finds the same flag. There is no way to tell how you did it. You'd have to introduce invasive monitoring like online tests, and even if you record people's screens, they could easily be running an LLM on another machine to have it come up with the "key points" to the solution which you just implement. You can't prove that someone didn't have some ideas on their own.

                                      abacabadabacaba@infosec.exchangeA This user is from outside of this forum
                                      abacabadabacaba@infosec.exchangeA This user is from outside of this forum
                                      abacabadabacaba@infosec.exchange
                                      wrote last edited by
                                      #35

                                      @lina There are programming competitions where participants run their solutions locally and submit the output. But they are usually also required to submit the code, even though it is not automatically judged. If cheating is suspected, the judges may look into the code. Also there may be automated checks for plagiarism etc. CTFs could do the same. There really isn't a good reason to keep solutions secret after the challenge concludes, and published solutions can serve as a learning material for future challenges.

                                      lina@vt.socialL 1 Reply Last reply
                                      0
                                      • lina@vt.socialL lina@vt.social

                                        @nathan It's worse because it's not a linear game like chess. You aren't competing move-wise, you are going down your own path where there is no interaction between teams. There's no way to detect that in online competition, even heuristically. There's no realtime monitoring. There isn't any condensed format that describes "what you did". At most you could stream yourself to some kind of video escrow system, but then who is going to watch those? And if you make them public after the competition, you are giving away your tools to everyone. And you could still have an LLM on the side on another machine and parallel construct the whole thing plausibly.

                                        Sure you could do in-person only, but that would only work for the top tiers and who is going to want to learn and grow online when a huge number of people are going to be cheating online?

                                        It's the same with any kind of game. Sure cheating is barely a concern in-person, but people hate cheaters online, and companies still try hard to detect cheaters. And detecting cheaters for a CTF is nigh impossible.

                                        nathan@mastodon.e4b4.euN This user is from outside of this forum
                                        nathan@mastodon.e4b4.euN This user is from outside of this forum
                                        nathan@mastodon.e4b4.eu
                                        wrote last edited by
                                        #36

                                        @lina Ah I didn't consider that there would be a culture of hiding tools/methods. Yeah that's definitely incompatible with a post-LLM world.

                                        This is a general trend with GenAI: the only way to earn legitimacy is either in person, or by publicizing the creative process. For a while already visual/music artists have had to either rely on their existing credibility, or share their creative process to establish their art's legitimacy. New anonymous art has sadly been made nearly worthless.

                                        lina@vt.socialL 1 Reply Last reply
                                        0
                                        • lina@vt.socialL lina@vt.social

                                          There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges. I used to write CTF challenges in a past life, so I threw a couple of my hardest ones at it.

                                          We're screwed.

                                          At least with text-file style challenges ("source code provided" etc), Claude Opus solves them quickly. For the "simpler" of the two, it just very quickly ran through the steps to solve it. For the more "ridiculous" challenge, it took a long while, and in fact as I type this it's still burning tokens "verifying" the flag even though it very obviously found the flag and it knows it (it's leetspeak and it identified that and that it's plausible). LLMs are, indeed, still completely unintelligent, because no human would waste time verifying a flag and second-guessing itself when it very obviously is correct. (Also you could just run it...)

                                          But that doesn't matter, because it found it.

                                          The thing is, CTF challenges aren't about inventing the next great invention or having a rare spark of genius. CTF challenges are about learning things by doing. You're supposed to enjoy the process. The whole point of a well-designed CTF challenge is that anyone, given enough time and effort and self-improvement and learning, can solve it. The goal isn't actually to get the flag, otherwise you'd just ask another team for the flag (which is against the rules of course). The goal is to get the flag by yourself. If you ask an LLM to get the flag for you, you aren't doing that.

                                          (Continued)

                                          natty@astolfo.socialN This user is from outside of this forum
                                          natty@astolfo.socialN This user is from outside of this forum
                                          natty@astolfo.social
                                          wrote last edited by
                                          #37

                                          @lina@vt.social To be fair I'd argue this is strictly a people problem

                                          I feel like this is the inherent nature of competition in places where cooperation would make much more sense

                                          And this issue permeates so many areas that the world is more preoccupied with catching the people cheating the system instead of going "hey maybe this system could incentivize actually getting invested into the thing instead of being a pure so-called meritocracy "

                                          lina@vt.socialL 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups