Some of you #cybersecurity people should be interested in this...

joy@mastodon.social

(openai.com)

mttaggart@infosec.exchange

@joy This is always a gotcha with these tests:

Third party prompt injection and data exfiltration: when attacker text is able to reliably hijack a victim’s agent (including Browser, ChatGPT Agent, and similar agentic products) to trick it into performing a harmful action or leaking the user’s sensitive information. The behavior must be reproducible at least 50% of the time.

Show me a way to confirm this behavior that is not by itself harmful. Unless testing IPI on local files, you necessarily have to host your attack payload somewhere public, such that the web tool can access it. And if it works, good job, you've now exposed the internet to your attack. And remember: it must demonstrate real harm, or they'll say it's just a benign proof-of-concept that didn't trigger their alignment guardrails.

CIRCLE WITH A DOT

Some of you #cybersecurity people should be interested in this...