There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges.

lina@vt.social

@abacabadabacaba It's much easier to parallel construct a CTF solution than a programming challenge. CTF challenges are all about having a series of realizations that lead to the answer.

If you ban LLMs in a programming challenge, you could conceivably detect signs of LLM usage in the program in various ways (not perfectly, but you could try). A CTF challenge just has one output, the flag. Everyone finds the same flag. There is no way to tell how you did it. You'd have to introduce invasive monitoring like online tests, and even if you record people's screens, they could easily be running an LLM on another machine to have it come up with the "key points" to the solution which you just implement. You can't prove that someone didn't have some ideas on their own.

echedellelr@soc.masfloss.net

@lina sorry, I replied because you were generalising and was not my experience here.

yalter@mastodon.online

@lina perhaps having separate categories for LLMs allowed vs. banned would help with 90% of this problem? So ppl who want to use LLM can do so at their pleasure, and only ppl who actively want to cheat (hopefully very few) will try to use LLM in the banned category.

neatchee@urusai.social

@abacabadabacaba @lina mostly because the incentive to cheat for time is so high and it places an ever increasing burden on the organizers to develop LLM detection methods that are prohibitively cumbersome.

Rules without the ability to enforce them effectively are just guideposts for bad actors

lina@vt.social

@nightwolf Yeah, I'm thinking mostly Jeopardy, which is the style I'm most familiar with. It just sucks to see that competition format completely break. I used to write a lot of challenges for that.

lina@vt.social

@YaLTeR I promise lots of people would cheat. These are competitions with rewards (bragging rights at minimum, but often cash prizes, swag, invitations to events, etc.)

ahasty@techhub.social

@lina I do feel like this is about how you use the LLM. I often find my self throwing something into my local llama to give me an ELI5 or what do these flags on this command do in combination.

But as someone who has Designed CTFs and watched someone fling through it without learning a damn thing, it can be hard to keep the faith.

When I took physics all those years ago my professor made us learn a slide rule before a calculator. If you skip over the basics and use a machine to do it..when the machine breaks or is wrong, who is gonna fix it and how?

grishka@friends.grishka.me

Asahi Linya (朝日りにゃ〜), I really hope that LLMs are a temporary phenomenon. Sure the local ones will remain even after the bubble finally bursts, but they're ridiculously bad, you do need millions of dollars worth of GPUs to get to that "it's still bad but it looks plausible" level of output quality.

luupies@mastodon.social

@lina I'm a geek... I like AI and all of that... but if I understood your post right, it's "complaining" of the consequences of the capabilities it provides and that reminds me of MMORPGs a long time ago where you could marvel at the deeds of someone while now, it's just google the setup and technique and just reproduce it... basically, humans are becoming less the center of intelligence and more cows following a line

matcha@anticapitalist.party

@lina they're engineering their self-incapacitation. Or decapacitation i suppose, because they get flush some skills down the drain to do that.

abacabadabacaba@infosec.exchange

@lina There are programming competitions where participants run their solutions locally and submit the output. But they are usually also required to submit the code, even though it is not automatically judged. If cheating is suspected, the judges may look into the code. Also there may be automated checks for plagiarism etc. CTFs could do the same. There really isn't a good reason to keep solutions secret after the challenge concludes, and published solutions can serve as a learning material for future challenges.

nathan@mastodon.e4b4.eu

@lina Ah I didn't consider that there would be a culture of hiding tools/methods. Yeah that's definitely incompatible with a post-LLM world.

This is a general trend with GenAI: the only way to earn legitimacy is either in person, or by publicizing the creative process. For a while already visual/music artists have had to either rely on their existing credibility, or share their creative process to establish their art's legitimacy. New anonymous art has sadly been made nearly worthless.

natty@astolfo.social

@lina@vt.social To be fair I'd argue this is strictly a people problem

I feel like this is the inherent nature of competition in places where cooperation would make much more sense

And this issue permeates so many areas that the world is more preoccupied with catching the people cheating the system instead of going "hey maybe this system could incentivize actually getting invested into the thing instead of being a pure so-called meritocracy "

shansterable@ohai.social

@lina
CTF = Capture the Flag, in case that helps anyone besides me

I try to do for initialisms and acronyms what alt text does for images.

Wikipedia: In computer security, Capture the Flag (CTF) is an exercise in which participants attempt to find text strings, called "flags", which are secretly hidden in purposefully vulnerable programs or websites

doragasu@mastodon.sdf.org

@lina I wonder if you can still design a challenge to be "LLM unfriendly" by changing the wording, just like those papers showing how an LLM aces problems like "river crossing", but if you change wording a bit, they just fail in weird and spectacular ways.

lina@vt.social

@doragasu Possibly? I might try removing all "hints" from one and trying again and seeing if it's any different. But that also affects human solvers... the hints are there to point you towards a website that explains the fundamentals of what's going on. The LLM didn't even read that, it just guessed from a filename and a comment and hulk smashed its way to guessing the general concept right with multiple attempts...

doragasu@mastodon.sdf.org

@lina In those papers trying to confuse LLMs, what was very effective IIRC, was adding data you don't need to use to the statement. The LLM tried to use all data you gave it to solve the problem and fail. Just like when a child is solving maths problems from a text book, all problems look similar so the child internalizes that you have to add two numbers and divide by the third one. Then you change the problem and the child fails because applies the same "formula".

doragasu@mastodon.sdf.org

@lina Like in here: https://arxiv.org/abs/2305.04388

doragasu@mastodon.sdf.org

@lina Or better this one: https://arxiv.org/abs/2410.05229

lina@vt.social

@grishka FYI your instance seems to have a very old display name cached for me (that it is using for mentions) ^{^;;}

CIRCLE WITH A DOT

There's a lot of discourse on Twitter about people using LLMs to solve CTF challenges.