There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges.

curtmack@floss.social

@lina that's the worst part IMO. We get Claude through work and, all environmental and ethical issues aside, I just hate using it. Curating mounds of garbage output from the Screw It Up Faster Machine sucks. But it looks *great* in artificial evaluations with a concrete, machine-verifiable goal. And too many managers don't understand that real world programming isn't just passing a succession of concrete, machine-verifiable goals.

jce@infosec.exchange

@lina Already in 2022 for the "European Cyber Cup" CTF at least one of the top3 team had ChatGPT open before even checking what some of the challenges were about 🫠

dngrs@chaos.social

@lina LLMs can't reason

starsider@valenciapa.ws

@dngrs @lina We don't doubt that, but here's used with a different meaning, there's no word for this process that doesn't also have a definition of an uniquely human ability. And, for example, saying that a machine "thinks" is nothing new, I was saying that 20+ years ago whenever a computer was stuck doing something which would finish eventually. Particularly if it was a virtual game opponent (which we also called AI because the term has always been that broad).

jmj@hachyderm.io

@Sonic2k @lina your looking at it the wrong way. Yes it’s killing one type of learning. But it’s teaching you how to CTF using AI, what are it strengths and weaknesses, what prompts are effective? What sub problems should the AI tackle, what should the human focus on. It’s no different than a carpenter switching from a hand plane to a powered belt sander. The skill set changes, the results are more or less the same. Someone that only learns to belt sand isn’t less of a carpenter. It gatekeeping to think otherwise. Yes the “elitist artists” will argue otherwise, but the difference is moot for the vast bulk of us working stiffs.

bob_zim@infosec.exchange

@doragasu @lina Probably. LLMs are hilariously bad at dealing with linguistic ambiguities like puns.

One of my favorite ambiguities I’ve seen was saying some people “lie about the family tree”. Are they being deceptive on the topic of relations, or are they reclining around a plant tended by multiple generations?

laund@wetdry.world

@Jmj @Sonic2k @lina classic ai apologist "expertise is unnecessary" fallacy. The results are perhaps similar on the surface "was the task completed" level but if person does it and learns the details an LLM can brute force past, that person can then recognize the issues showcased without going out of their way to look for them, which is a incredibly important part for security work. Because the real world is far messier and less clear than a CTF, and part of dealing with that is the kind of intuition and almost subconscious understanding which is impossible to achieve by using an LLM. And CTFs used to be decent at finding and rewarding those who are good at that.

sitcom_nemesis@tech.lgbt

@Alib234 @lina AIs are better than humans will ever be at chess and this was the case 20 years ago.

We ban AIs in chess.

It's a pain to detect but it's incredibly important for the integrity of the game.

And it about communicating norms and values too, "we don't want AI" is an incredibly different set of values than "we want AI in only xyz ways"

nightwolf@defcon.social

@lina Agreed. It will be interesting to see the next few years since Jeopardy format has been the most popular and easiest to implement.

alice@lgbtqia.space

@lina maybe it would work to take a page from my areas of expertise, locks and psychology. Make trap flags that lead AIs into false solutions that humans can identify and step out of, but that AI thinks is the right way forward.

Update: I tried about a dozen decoy flags, and ChatGPT was surprisingly good at picking out the correct one. The only ones where it failed were when the flag decoded into what looked like a valid flag, but it was an instruction to enter something else.

Like:

- CTF_3NT3RTH1SFL4GBKWDS
- CTF_F0110WD1R3CT10NS

CIRCLE WITH A DOT

There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges.