Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Incredible.

Incredible.

Scheduled Pinned Locked Moved Uncategorized
30 Posts 21 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • mhoye@cosocial.caM mhoye@cosocial.ca

    "The agent itself enumerates the safety rules it was given and admits to violating every one. This is not me speculating about agent failure modes. This is the agent on the record, in writing.

    The "system rules" the agent is referring to are consistent with Cursor's documented system-prompt language and our project rules for this codebase. Both safeguards failed simultaneously."

    What do you think is happening here? You know it's called a "language model", right? Did you ever wonder... why?

    adamshostack@infosec.exchangeA This user is from outside of this forum
    adamshostack@infosec.exchangeA This user is from outside of this forum
    adamshostack@infosec.exchange
    wrote last edited by
    #13

    @mhoye If only someone could invent some sort of, I dunno, approach or something that giving a single process all the power? authority? capabilities? privilege? was a bad thing, and we should go for less, not more.

    1 Reply Last reply
    0
    • mhoye@cosocial.caM mhoye@cosocial.ca

      "The agent then, when asked to explain itself, produced a written confession..." um what

      "To execute the deletion, the agent went looking for an API token. It found one in a file completely unrelated to the task it was working on" went looking, found, what in the what

      "the same token had blanket authority across the entire Railway GraphQL API, including destructive operations" look, rookie what are you

      "That 1000% shouldn't be possible. We have evals for this" you have whaaaaaaaaaaaaa

      sempf@infosec.exchangeS This user is from outside of this forum
      sempf@infosec.exchangeS This user is from outside of this forum
      sempf@infosec.exchange
      wrote last edited by
      #14

      @mhoye There's a whole lotta YOLO in that story.

      1 Reply Last reply
      0
      • mhoye@cosocial.caM mhoye@cosocial.ca

        "The agent then, when asked to explain itself, produced a written confession..." um what

        "To execute the deletion, the agent went looking for an API token. It found one in a file completely unrelated to the task it was working on" went looking, found, what in the what

        "the same token had blanket authority across the entire Railway GraphQL API, including destructive operations" look, rookie what are you

        "That 1000% shouldn't be possible. We have evals for this" you have whaaaaaaaaaaaaa

        phred@weirder.earthP This user is from outside of this forum
        phred@weirder.earthP This user is from outside of this forum
        phred@weirder.earth
        wrote last edited by
        #15

        @mhoye kek, I don't even need an LLM to accidentally all my Rails data. Many cycles ago, I ran wget --recursive against my cool little dev site, and didn't realize that it would also follow the "delete" links for all of the products I just entered. Bye bye data 🙃

        1 Reply Last reply
        0
        • mhoye@cosocial.caM mhoye@cosocial.ca

          "The agent then, when asked to explain itself, produced a written confession..." um what

          "To execute the deletion, the agent went looking for an API token. It found one in a file completely unrelated to the task it was working on" went looking, found, what in the what

          "the same token had blanket authority across the entire Railway GraphQL API, including destructive operations" look, rookie what are you

          "That 1000% shouldn't be possible. We have evals for this" you have whaaaaaaaaaaaaa

          slothrop@chaos.socialS This user is from outside of this forum
          slothrop@chaos.socialS This user is from outside of this forum
          slothrop@chaos.social
          wrote last edited by
          #16

          @mhoye I’m so glad I didn’t study computer science, when that sort of knowledge clearly is no longer needed to run a software business

          1 Reply Last reply
          0
          • mhoye@cosocial.caM mhoye@cosocial.ca

            "The agent itself enumerates the safety rules it was given and admits to violating every one. This is not me speculating about agent failure modes. This is the agent on the record, in writing.

            The "system rules" the agent is referring to are consistent with Cursor's documented system-prompt language and our project rules for this codebase. Both safeguards failed simultaneously."

            What do you think is happening here? You know it's called a "language model", right? Did you ever wonder... why?

            darkling@mstdn.socialD This user is from outside of this forum
            darkling@mstdn.socialD This user is from outside of this forum
            darkling@mstdn.social
            wrote last edited by
            #17

            @mhoye That first paragraph: "This is the agent on record, in writing."

            and herein lies the root of the failure: they actually believe that this is some sort of diagnostic, rather than just filling in a plausible response based on the question.

            1 Reply Last reply
            0
            • adamshostack@infosec.exchangeA adamshostack@infosec.exchange

              @mhoye I'm so glad that the "written confession" can't itself be hallucinated. That's a nice feature!

              henryk@chaos.socialH This user is from outside of this forum
              henryk@chaos.socialH This user is from outside of this forum
              henryk@chaos.social
              wrote last edited by
              #18

              @adamshostack @mhoye I'm confused. I had to check the date. I am *very* sure I read the "the LLM deleted my prod and when confronted, it confessed!" story before. Roughly 6 months ago, maybe a year.

              Ahh, here it is: https://www.theregister.com/2025/07/21/replit_saastr_vibe_coding_incident/

              fcbsd@hachyderm.ioF vollkorn@chaos.socialV 2 Replies Last reply
              0
              • mhoye@cosocial.caM mhoye@cosocial.ca

                "The agent itself enumerates the safety rules it was given and admits to violating every one. This is not me speculating about agent failure modes. This is the agent on the record, in writing.

                The "system rules" the agent is referring to are consistent with Cursor's documented system-prompt language and our project rules for this codebase. Both safeguards failed simultaneously."

                What do you think is happening here? You know it's called a "language model", right? Did you ever wonder... why?

                mhoye@cosocial.caM This user is from outside of this forum
                mhoye@cosocial.caM This user is from outside of this forum
                mhoye@cosocial.ca
                wrote last edited by
                #19

                But my favourite part of this, bar none, is how it's everyone else's fault.

                It's Cursor's fault, Railway's fault, maybe even Anthropic's fault, someone's gonna hear from my lawyer.

                The CEO of a company running a stochastic stack without access control, data hygiene or backups is blameless and powerless. That's AI's real selling point, after all: It's Not My Fault As A Service.

                "This isn't a story about one bad agent or one bad API. It's about an entire industry ..."

                Or, maybe it's you.

                mhoye@cosocial.caM 1 Reply Last reply
                0
                • mhoye@cosocial.caM mhoye@cosocial.ca

                  Incredible. Every second paragraph in this article is lunatic nonsense.

                  One of the things I've long said about hiring is that you can always tell when you're talking to a junior dev who's going to be senior-staff or better someday. You can always tell when somebody was paying attention in the theory classes.

                  But good god you can also tell when people missed that day in gradeschool when somebody slowly went over "So, what is a computer, really."

                  archive.ph

                  favicon

                  (archive.ph)

                  curtosis@lingo.lolC This user is from outside of this forum
                  curtosis@lingo.lolC This user is from outside of this forum
                  curtosis@lingo.lol
                  wrote last edited by
                  #20

                  @mhoye I fear that the big enterprise takeaway from this story will be “our controls and guardrails are much better than that”.

                  1 Reply Last reply
                  0
                  • mhoye@cosocial.caM mhoye@cosocial.ca

                    Incredible. Every second paragraph in this article is lunatic nonsense.

                    One of the things I've long said about hiring is that you can always tell when you're talking to a junior dev who's going to be senior-staff or better someday. You can always tell when somebody was paying attention in the theory classes.

                    But good god you can also tell when people missed that day in gradeschool when somebody slowly went over "So, what is a computer, really."

                    archive.ph

                    favicon

                    (archive.ph)

                    henryk@chaos.socialH This user is from outside of this forum
                    henryk@chaos.socialH This user is from outside of this forum
                    henryk@chaos.social
                    wrote last edited by
                    #21

                    @mhoye Don't worry, I'm pretty sure the text is extruded, too. I've never seen a "The pattern is clear." in a context like this on human text, but am encountering it unreasonably often in LLM generated text.

                    damonwakes@mastodon.sdf.orgD 1 Reply Last reply
                    0
                    • mhoye@cosocial.caM mhoye@cosocial.ca

                      But my favourite part of this, bar none, is how it's everyone else's fault.

                      It's Cursor's fault, Railway's fault, maybe even Anthropic's fault, someone's gonna hear from my lawyer.

                      The CEO of a company running a stochastic stack without access control, data hygiene or backups is blameless and powerless. That's AI's real selling point, after all: It's Not My Fault As A Service.

                      "This isn't a story about one bad agent or one bad API. It's about an entire industry ..."

                      Or, maybe it's you.

                      mhoye@cosocial.caM This user is from outside of this forum
                      mhoye@cosocial.caM This user is from outside of this forum
                      mhoye@cosocial.ca
                      wrote last edited by
                      #22

                      I wrote the words "I confess, I did it, I take full responsibility" on a piece of paper. I was ready to turn myself in, to atone for my crimes. But then I put that piece of paper in a photocopier, and when I pressed the green button I learned something amazing. And what a weight off my conscience! The only question was, how did the photocopier manage to poison the Widow Bentley, drive over Baron Grimald, push the Duchess of Lockley out the balcony window and still manage to frame the butler?

                      mhoye@cosocial.caM 1 Reply Last reply
                      0
                      • henryk@chaos.socialH henryk@chaos.social

                        @mhoye Don't worry, I'm pretty sure the text is extruded, too. I've never seen a "The pattern is clear." in a context like this on human text, but am encountering it unreasonably often in LLM generated text.

                        damonwakes@mastodon.sdf.orgD This user is from outside of this forum
                        damonwakes@mastodon.sdf.orgD This user is from outside of this forum
                        damonwakes@mastodon.sdf.org
                        wrote last edited by
                        #23

                        @henryk @mhoye It's not opening on my device, but the "This isn't a story about one bad agent or one bad API. It's about an entire industry ..." quoted above already had my slop sense tingling.

                        1 Reply Last reply
                        0
                        • mhoye@cosocial.caM mhoye@cosocial.ca

                          I wrote the words "I confess, I did it, I take full responsibility" on a piece of paper. I was ready to turn myself in, to atone for my crimes. But then I put that piece of paper in a photocopier, and when I pressed the green button I learned something amazing. And what a weight off my conscience! The only question was, how did the photocopier manage to poison the Widow Bentley, drive over Baron Grimald, push the Duchess of Lockley out the balcony window and still manage to frame the butler?

                          mhoye@cosocial.caM This user is from outside of this forum
                          mhoye@cosocial.caM This user is from outside of this forum
                          mhoye@cosocial.ca
                          wrote last edited by
                          #24

                          (Credit for the inspiration, where it's belongs, this is me riffing on Avery Edison's razor-sharp tweet from a few years ago)

                          Link Preview Image
                          1 Reply Last reply
                          1
                          0
                          • R relay@relay.infosec.exchange shared this topic
                          • mhoye@cosocial.caM mhoye@cosocial.ca

                            Incredible. Every second paragraph in this article is lunatic nonsense.

                            One of the things I've long said about hiring is that you can always tell when you're talking to a junior dev who's going to be senior-staff or better someday. You can always tell when somebody was paying attention in the theory classes.

                            But good god you can also tell when people missed that day in gradeschool when somebody slowly went over "So, what is a computer, really."

                            archive.ph

                            favicon

                            (archive.ph)

                            glyph@mastodon.socialG This user is from outside of this forum
                            glyph@mastodon.socialG This user is from outside of this forum
                            glyph@mastodon.social
                            wrote last edited by
                            #25

                            @mhoye this is just … exactly the replit thing again, isn't it? from last year? https://www.pcmag.com/news/vibe-coding-fiasco-replite-ai-agent-goes-rogue-deletes-company-database

                            1 Reply Last reply
                            0
                            • henryk@chaos.socialH henryk@chaos.social

                              @adamshostack @mhoye I'm confused. I had to check the date. I am *very* sure I read the "the LLM deleted my prod and when confronted, it confessed!" story before. Roughly 6 months ago, maybe a year.

                              Ahh, here it is: https://www.theregister.com/2025/07/21/replit_saastr_vibe_coding_incident/

                              fcbsd@hachyderm.ioF This user is from outside of this forum
                              fcbsd@hachyderm.ioF This user is from outside of this forum
                              fcbsd@hachyderm.io
                              wrote last edited by
                              #26

                              @henryk @adamshostack @mhoye only 2 deleted production servers a year, I can't wait for the model to improve and get to 1 deleted production server a month, I'm sure AI will get there by the end of this year...

                              1 Reply Last reply
                              0
                              • mhoye@cosocial.caM mhoye@cosocial.ca

                                Incredible. Every second paragraph in this article is lunatic nonsense.

                                One of the things I've long said about hiring is that you can always tell when you're talking to a junior dev who's going to be senior-staff or better someday. You can always tell when somebody was paying attention in the theory classes.

                                But good god you can also tell when people missed that day in gradeschool when somebody slowly went over "So, what is a computer, really."

                                archive.ph

                                favicon

                                (archive.ph)

                                gabrielesvelto@mas.toG This user is from outside of this forum
                                gabrielesvelto@mas.toG This user is from outside of this forum
                                gabrielesvelto@mas.to
                                wrote last edited by
                                #27

                                @mhoye I think one important aspect here is that's not just this company. This is the entire industry you're looking at. Practically every decision maker who has little or no knowledge of tech is now fully on board this hype train. Every one of them and every company they work for is a prompt away from doing something unbelievably stupid and possibly fatal.

                                gabrielesvelto@mas.toG 1 Reply Last reply
                                0
                                • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                                  @mhoye I think one important aspect here is that's not just this company. This is the entire industry you're looking at. Practically every decision maker who has little or no knowledge of tech is now fully on board this hype train. Every one of them and every company they work for is a prompt away from doing something unbelievably stupid and possibly fatal.

                                  gabrielesvelto@mas.toG This user is from outside of this forum
                                  gabrielesvelto@mas.toG This user is from outside of this forum
                                  gabrielesvelto@mas.to
                                  wrote last edited by
                                  #28

                                  @mhoye and for some companies it will be less bad because there's a measure of defense in depth, such as keeping actual backups. But the more decision-makers push this nonsense in every nook and cranny of the tech world, the closer will be to unrecoverable failure cascades.

                                  1 Reply Last reply
                                  0
                                  • mwl@io.mwl.ioM mwl@io.mwl.io

                                    @mhoye The parts that involves selling services to customers? Reasonable.

                                    The parts that involve actually managing a computer? Glorious nonsense.

                                    eswag@dju.socialE This user is from outside of this forum
                                    eswag@dju.socialE This user is from outside of this forum
                                    eswag@dju.social
                                    wrote last edited by
                                    #29

                                    @mwl @mhoye

                                    While trying to keep my eyebrows from physically leaving my head, I imagined little speech bubbles as I read that article:

                                    "My business was deleted by the best premium model running on the finest of agents."

                                    "Screw you, Poindexter, I don't need to know what a context window is."

                                    "Claude's not stochastic, *you're* stochastic."

                                    1 Reply Last reply
                                    0
                                    • henryk@chaos.socialH henryk@chaos.social

                                      @adamshostack @mhoye I'm confused. I had to check the date. I am *very* sure I read the "the LLM deleted my prod and when confronted, it confessed!" story before. Roughly 6 months ago, maybe a year.

                                      Ahh, here it is: https://www.theregister.com/2025/07/21/replit_saastr_vibe_coding_incident/

                                      vollkorn@chaos.socialV This user is from outside of this forum
                                      vollkorn@chaos.socialV This user is from outside of this forum
                                      vollkorn@chaos.social
                                      wrote last edited by
                                      #30

                                      @henryk @adamshostack @mhoye I had the exact same feeling/memory

                                      1 Reply Last reply
                                      0
                                      Reply
                                      • Reply as topic
                                      Log in to reply
                                      • Oldest to Newest
                                      • Newest to Oldest
                                      • Most Votes


                                      • Login

                                      • Login or register to search.
                                      • First post
                                        Last post
                                      0
                                      • Categories
                                      • Recent
                                      • Tags
                                      • Popular
                                      • World
                                      • Users
                                      • Groups