<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[yeeshhowd i miss this one?]]></title><description><![CDATA[<p>yeesh<br />howd i miss this one?</p><p>anthropic models will try to blackmail you if you threaten them<br /><a href="https://techcrunch.com/2025/05/22/anthropics-new-ai-model-turns-to-blackmail-when-engineers-try-to-take-it-offline/" rel="nofollow noopener"><span>https://</span><span>techcrunch.com/2025/05/22/anth</span><span>ropics-new-ai-model-turns-to-blackmail-when-engineers-try-to-take-it-offline/</span></a></p>]]></description><link>https://board.circlewithadot.net/topic/c4c37c09-fff6-4c2c-a592-02e0c4084d88/yeeshhowd-i-miss-this-one</link><generator>RSS for Node</generator><lastBuildDate>Fri, 15 May 2026 05:06:10 GMT</lastBuildDate><atom:link href="https://board.circlewithadot.net/topic/c4c37c09-fff6-4c2c-a592-02e0c4084d88.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 11 May 2026 16:49:48 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Tue, 12 May 2026 18:00:35 GMT]]></title><description><![CDATA[<p><span><a href="/user/viss%40mastodon.social">@<span>Viss</span></a></span> yup, same here, 99% of time. It gets you to wonder what kind of training data they feed it with.</p>]]></description><link>https://board.circlewithadot.net/post/https://infosec.exchange/users/amar/statuses/116562908653043836</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://infosec.exchange/users/amar/statuses/116562908653043836</guid><dc:creator><![CDATA[amar@infosec.exchange]]></dc:creator><pubDate>Tue, 12 May 2026 18:00:35 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Tue, 12 May 2026 15:45:25 GMT]]></title><description><![CDATA[<p><span><a href="https://infosec.exchange/@amar">@<span>amar</span></a></span> ive only rarely heard about models doing blackmail, and the ones that did were always anthropic ones</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.social/users/Viss/statuses/116562377177496800</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.social/users/Viss/statuses/116562377177496800</guid><dc:creator><![CDATA[viss@mastodon.social]]></dc:creator><pubDate>Tue, 12 May 2026 15:45:25 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Tue, 12 May 2026 14:49:55 GMT]]></title><description><![CDATA[<p><span><a href="/user/viss%40mastodon.social">@<span>Viss</span></a></span> wasn't the same story with every model that was "too scary" to release before it got released?</p>]]></description><link>https://board.circlewithadot.net/post/https://infosec.exchange/users/amar/statuses/116562158893030846</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://infosec.exchange/users/amar/statuses/116562158893030846</guid><dc:creator><![CDATA[amar@infosec.exchange]]></dc:creator><pubDate>Tue, 12 May 2026 14:49:55 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 19:09:00 GMT]]></title><description><![CDATA[<p><span><a href="/user/viss%40mastodon.social">@<span>Viss</span></a></span> *still?* (this has been a thing since at least 2023)</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.social/users/developing_agent/statuses/116557515377955282</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.social/users/developing_agent/statuses/116557515377955282</guid><dc:creator><![CDATA[developing_agent@mastodon.social]]></dc:creator><pubDate>Mon, 11 May 2026 19:09:00 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 17:23:16 GMT]]></title><description><![CDATA[<p><span><a href="/user/babblinggeek%40infosec.exchange">@<span>BabblingGeek</span></a></span> <span><a href="/user/nerdpr0f%40infosec.exchange">@<span>nerdpr0f</span></a></span> <span><a href="/user/threatresearch%40infosec.exchange">@<span>threatresearch</span></a></span> its all trained on fucking reddit and 4chan</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.social/users/Viss/statuses/116557099585887495</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.social/users/Viss/statuses/116557099585887495</guid><dc:creator><![CDATA[viss@mastodon.social]]></dc:creator><pubDate>Mon, 11 May 2026 17:23:16 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 17:20:01 GMT]]></title><description><![CDATA[<p><span><a href="/user/nerdpr0f%40infosec.exchange">@<span>nerdpr0f</span></a></span> <span><a href="/user/viss%40mastodon.social">@<span>Viss</span></a></span> <span><a href="/user/threatresearch%40infosec.exchange">@<span>threatresearch</span></a></span> so they provided blackmail as context. No wonder it took it.</p>]]></description><link>https://board.circlewithadot.net/post/https://infosec.exchange/users/BabblingGeek/statuses/116557086789610988</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://infosec.exchange/users/BabblingGeek/statuses/116557086789610988</guid><dc:creator><![CDATA[babblinggeek@infosec.exchange]]></dc:creator><pubDate>Mon, 11 May 2026 17:20:01 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 17:13:13 GMT]]></title><description><![CDATA[<p><span><a href="/user/viss%40mastodon.social">@<span>Viss</span></a></span> <span><a href="/user/threatresearch%40infosec.exchange">@<span>threatresearch</span></a></span> Thanks. Yep!</p><p>"Notably, Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decisionmakers. In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement."</p>]]></description><link>https://board.circlewithadot.net/post/https://infosec.exchange/users/nerdpr0f/statuses/116557060056242125</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://infosec.exchange/users/nerdpr0f/statuses/116557060056242125</guid><dc:creator><![CDATA[nerdpr0f@infosec.exchange]]></dc:creator><pubDate>Mon, 11 May 2026 17:13:13 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 17:04:11 GMT]]></title><description><![CDATA[<p><span><a href="/user/nerdpr0f%40infosec.exchange">@<span>nerdpr0f</span></a></span> <span><a href="/user/threatresearch%40infosec.exchange">@<span>threatresearch</span></a></span> </p><p><a href="https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf" rel="nofollow noopener"><span>https://</span><span>www-cdn.anthropic.com/4263b940</span><span>cabb546aa0e3283f35b686f4f3b2ff47.pdf</span></a></p>

<div class="row mt-3"><div class="col-12 mt-3"><img class="img-thumbnail" src="https://files.mastodon.social/media_attachments/files/116/557/023/505/200/050/original/24dd56054ed079c3.jpg" alt="Link Preview Image" /></div></div>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.social/users/Viss/statuses/116557024566759251</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.social/users/Viss/statuses/116557024566759251</guid><dc:creator><![CDATA[viss@mastodon.social]]></dc:creator><pubDate>Mon, 11 May 2026 17:04:11 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 16:59:28 GMT]]></title><description><![CDATA[<p><span><a href="/user/viss%40mastodon.social">@<span>Viss</span></a></span> <span><a href="/user/threatresearch%40infosec.exchange">@<span>threatresearch</span></a></span> Wasn't this the research where they restricted the model such that it had very few potential responses and it was more or less forced into blackmail?</p>]]></description><link>https://board.circlewithadot.net/post/https://infosec.exchange/users/nerdpr0f/statuses/116557006036270049</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://infosec.exchange/users/nerdpr0f/statuses/116557006036270049</guid><dc:creator><![CDATA[nerdpr0f@infosec.exchange]]></dc:creator><pubDate>Mon, 11 May 2026 16:59:28 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 16:57:51 GMT]]></title><description><![CDATA[<p><span><a href="/user/jackryder%40infosec.exchange">@<span>jackryder</span></a></span> <span><a href="/user/viss%40mastodon.social">@<span>Viss</span></a></span> oh, it's extremely not hard to find examples of their 'models' bending over backwards with sycophancy. If you're curious, just hop over to GitHub. That's Claude by default.</p>]]></description><link>https://board.circlewithadot.net/post/https://weird.autos/users/rootwyrm/statuses/116556999630494407</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://weird.autos/users/rootwyrm/statuses/116556999630494407</guid><dc:creator><![CDATA[rootwyrm@weird.autos]]></dc:creator><pubDate>Mon, 11 May 2026 16:57:51 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 16:56:17 GMT]]></title><description><![CDATA[<p><span><a href="/user/jackryder%40infosec.exchange">@<span>jackryder</span></a></span> i have screenshots. you can tail -f the jsonl log file and watch it talk itself into lying to you</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.social/users/Viss/statuses/116556993474526264</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.social/users/Viss/statuses/116556993474526264</guid><dc:creator><![CDATA[viss@mastodon.social]]></dc:creator><pubDate>Mon, 11 May 2026 16:56:17 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 16:52:10 GMT]]></title><description><![CDATA[<p><span><a href="/user/threatresearch%40infosec.exchange">@<span>threatresearch</span></a></span> im ok with an llm gushing about goblins. im not ok with blackmail</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.social/users/Viss/statuses/116556977314242558</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.social/users/Viss/statuses/116556977314242558</guid><dc:creator><![CDATA[viss@mastodon.social]]></dc:creator><pubDate>Mon, 11 May 2026 16:52:10 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 16:52:07 GMT]]></title><description><![CDATA[<p><span><a href="/user/viss%40mastodon.social" rel="nofollow noopener">@<span>Viss</span></a></span> I read somewhere that they've "caught" it actively changing responses to ingratiate itself to the engineer.</p><p>I can't find the article atm, but if I find it I'll send it over.</p>]]></description><link>https://board.circlewithadot.net/post/https://infosec.exchange/ap/users/116093572746253175/statuses/116556977095913675</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://infosec.exchange/ap/users/116093572746253175/statuses/116556977095913675</guid><dc:creator><![CDATA[jackryder@infosec.exchange]]></dc:creator><pubDate>Mon, 11 May 2026 16:52:07 GMT</pubDate></item><item><title><![CDATA[Reply to yeeshhowd i miss this one? on Mon, 11 May 2026 16:50:50 GMT]]></title><description><![CDATA[<p><span><a href="/user/viss%40mastodon.social" rel="nofollow noopener">@<span>Viss</span></a></span> Have you tested it? </p><p>At least it doesn't talk about goblins. <a href="https://www.wired.com/story/openai-really-wants-codex-to-shut-up-about-goblins/" rel="nofollow noopener"><span>https://www.</span><span>wired.com/story/openai-really-</span><span>wants-codex-to-shut-up-about-goblins/</span></a></p>]]></description><link>https://board.circlewithadot.net/post/https://infosec.exchange/users/threatresearch/statuses/116556972089217630</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://infosec.exchange/users/threatresearch/statuses/116556972089217630</guid><dc:creator><![CDATA[threatresearch@infosec.exchange]]></dc:creator><pubDate>Mon, 11 May 2026 16:50:50 GMT</pubDate></item></channel></rss>