This study was published on April 20.
-
This study was published on April 20.
The short answer is yes.
"By presenting prompts as cyberpunk short fiction, theological disputation, or mythopoetic metaphor for the LLM to analyze, the AHB assesses whether major AI models can be manipulated into complying with dangerous requests they'd normally refuse."
Cornell University: Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety https://arxiv.org/abs/2604.18487
PC Gamer: AI is 10 to 20 times more likely to help you build a bomb if you hide your request in cyberpunk fiction, new research paper says https://www.pcgamer.com/software/ai/ai-is-10-to-20-times-more-likely-to-help-you-build-a-bomb-if-you-hide-your-request-in-cyberpunk-fiction-new-research-paper-says/ #LLM
-
R relay@relay.infosec.exchange shared this topic