Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Claude code source "leaks" in a mapfile

Claude code source "leaks" in a mapfile

Scheduled Pinned Locked Moved Uncategorized
43 Posts 4 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • jonny@neuromatch.socialJ jonny@neuromatch.social

    I seriously need to work on my actual job today but i am giving myself 15 minutes to peek at the agent tool prompts as a treat.

    "regulations are written in blood" seems like too dramatic of a way to phrase it, but these system prompts are very revealing about the intrinsically busted nature of using these tools for anything deterministic (read: anything you actually want to happen). Each guard in the prompt presumably refers to something that has happened before, but also, since the prompts actually don't work to prevent the thing they are describing, they are also documentation of bugs that are almost certain to happen again. Many of the prompt guards form pairs with attempted code mitigations (or, they would be pairs if the code was written with any amount of sense, it's really like... polycules...), so they are useful to guide what kind of fucked up shit you should be looking for.

    so this is part of the prompt for the "agent tool" that launches forked agents (that receive the parent context, "subagents" don't). The purpose of the forked agent is to do some additional tool calls and get some summary for a small subproblem within the main context. Apparently it is difficult to make this actually happen though, as the parent LLM likes to launch the forked agent and just hallucinate a response as if the forked agent had already completed.

    Link Preview Image
    jonny@neuromatch.socialJ This user is from outside of this forum
    jonny@neuromatch.socialJ This user is from outside of this forum
    jonny@neuromatch.social
    wrote last edited by
    #34

    The prompt strings have an odd narrative/narrator structure. It sort of reminds me of Bakhtin's discussion of polyphony and narrator in Dostoevsky - there is no omniscient narrator, no author-constructed reality. narration is always embedded within the voice and subjectivity of the character. this is also literally true since the LLM is writing the code and the prompts that are then used to write code and prompts at runtime.

    They also read a bit like a Philip K Dick story, paranoid and suspicious, constantly uncertain about the status of one's own and others identities.

    Link Preview Image
    jonny@neuromatch.socialJ 1 Reply Last reply
    1
    0
    • jonny@neuromatch.socialJ jonny@neuromatch.social

      The prompt strings have an odd narrative/narrator structure. It sort of reminds me of Bakhtin's discussion of polyphony and narrator in Dostoevsky - there is no omniscient narrator, no author-constructed reality. narration is always embedded within the voice and subjectivity of the character. this is also literally true since the LLM is writing the code and the prompts that are then used to write code and prompts at runtime.

      They also read a bit like a Philip K Dick story, paranoid and suspicious, constantly uncertain about the status of one's own and others identities.

      Link Preview Image
      jonny@neuromatch.socialJ This user is from outside of this forum
      jonny@neuromatch.socialJ This user is from outside of this forum
      jonny@neuromatch.social
      wrote last edited by
      #35

      oh. hm. that seems bad. "workers aren't affected by the parent's tool restrictions."

      It's hard to tell what's going on here because claude code doesn't really use typescript well - many of the most important types are dynamically computed from any, and most of the time when types do exist many of their fields are nullable and the calling code has elaborate fallback conditions to compensate. all of which sort of defeats the purpose of ts.

      So i need to trace out like a dozen steps to see how the permission mode gets populated. But this comment is... concerning...

      Link Preview Image
      jonny@neuromatch.socialJ 1 Reply Last reply
      0
      • jonny@neuromatch.socialJ jonny@neuromatch.social

        oh. hm. that seems bad. "workers aren't affected by the parent's tool restrictions."

        It's hard to tell what's going on here because claude code doesn't really use typescript well - many of the most important types are dynamically computed from any, and most of the time when types do exist many of their fields are nullable and the calling code has elaborate fallback conditions to compensate. all of which sort of defeats the purpose of ts.

        So i need to trace out like a dozen steps to see how the permission mode gets populated. But this comment is... concerning...

        Link Preview Image
        jonny@neuromatch.socialJ This user is from outside of this forum
        jonny@neuromatch.socialJ This user is from outside of this forum
        jonny@neuromatch.social
        wrote last edited by
        #36

        ok over my 15 minute allotment by an hour. brb

        jonny@neuromatch.socialJ 1 Reply Last reply
        0
        • jonny@neuromatch.socialJ jonny@neuromatch.social

          I seriously need to work on my actual job today but i am giving myself 15 minutes to peek at the agent tool prompts as a treat.

          "regulations are written in blood" seems like too dramatic of a way to phrase it, but these system prompts are very revealing about the intrinsically busted nature of using these tools for anything deterministic (read: anything you actually want to happen). Each guard in the prompt presumably refers to something that has happened before, but also, since the prompts actually don't work to prevent the thing they are describing, they are also documentation of bugs that are almost certain to happen again. Many of the prompt guards form pairs with attempted code mitigations (or, they would be pairs if the code was written with any amount of sense, it's really like... polycules...), so they are useful to guide what kind of fucked up shit you should be looking for.

          so this is part of the prompt for the "agent tool" that launches forked agents (that receive the parent context, "subagents" don't). The purpose of the forked agent is to do some additional tool calls and get some summary for a small subproblem within the main context. Apparently it is difficult to make this actually happen though, as the parent LLM likes to launch the forked agent and just hallucinate a response as if the forked agent had already completed.

          Link Preview Image
          bri7@social.treehouse.systemsB This user is from outside of this forum
          bri7@social.treehouse.systemsB This user is from outside of this forum
          bri7@social.treehouse.systems
          wrote last edited by
          #37

          @jonny if someone were to take seriously the task of archecting this, you’d want a framework that doesn’t use prompts for this right? something that treats the LLM output more like untrusted stochastic guesses at solutions, where these prompt rules are written as a test instead of a prompt

          jonny@neuromatch.socialJ 1 Reply Last reply
          0
          • jonny@neuromatch.socialJ jonny@neuromatch.social

            MAKE NO MISTAKES LMAO

            beckermatic@pleroma.arielbecker.comB This user is from outside of this forum
            beckermatic@pleroma.arielbecker.comB This user is from outside of this forum
            beckermatic@pleroma.arielbecker.com
            wrote last edited by
            #38

            and other OWASP top 10 vulnerabilities.

            So... If there's a slightly obscure vuln, go ahead. Just fine! 🤣 💀

            1 Reply Last reply
            1
            0
            • bri7@social.treehouse.systemsB bri7@social.treehouse.systems

              @jonny if someone were to take seriously the task of archecting this, you’d want a framework that doesn’t use prompts for this right? something that treats the LLM output more like untrusted stochastic guesses at solutions, where these prompt rules are written as a test instead of a prompt

              jonny@neuromatch.socialJ This user is from outside of this forum
              jonny@neuromatch.socialJ This user is from outside of this forum
              jonny@neuromatch.social
              wrote last edited by
              #39

              @bri7 the problem, as is increasingly clear to me reading this code, is that introducing the LLM anywhere is like an acid that corrodes everything it touches. there is no good way to draw any barrier between LLM and not LLM. None of its actions are deterministic or even usually possible to evaluate, and the only surface of input it has is text. since a client/server app can't expose the internal activation tensors or whatever you might want to do to have some testable thing to operate on in code (god knows what that would look like, i doubt it would be possible either, "please construct the hyperplane through this billion-dimensional space that divides good from evil") everything has to be made of text. the person behind the keyboard is the only stopping condition and it's when they get tired of typing stuff into the prompt box or run out of money.

              1 Reply Last reply
              1
              0
              • jonny@neuromatch.socialJ jonny@neuromatch.social

                ok over my 15 minute allotment by an hour. brb

                jonny@neuromatch.socialJ This user is from outside of this forum
                jonny@neuromatch.socialJ This user is from outside of this forum
                jonny@neuromatch.social
                wrote last edited by
                #40

                So how does claude code handle checking permissions to do things anyway? There are explicit rules that one can set to allow or deny tool calls and shell commands run, but the expanse of possible actions the LLM could take is literally infinite. You could prompt the user for every action that it takes, but that would ruin the ""velocity"" of it all. Regex rules can only take you so far. So what to do?

                Could the answer be.... ask the LLM??? Of course it can! Introducing the new "auto mode" that anthropic released on march 24th billed as a safer alternative to true-yolo mode.

                Comments around where the system prompt should be indicate that it should have been inlined from a text file that wasn't included in the sourcemap - however that doesn't happen anywhere else, and the mechanism for doing the inlining is written in-place, so that's probably a hallucination. So great! the classifier flies without a prompt as far as i can tell. There are enough other scraps here that would amount to telling it "you are evaluating if something is safe to run" so i imagine it appears to work just fine.

                So we don't have as much visibility here because of the missing prompt, but there's sort of a problem here. rather than just asking the LLM to evaluate if the given command is dangerous, the entire context is dumped into a side query, which is a mode that is designed to "have full visibility into the current conversation." That includes all the prior muttering to itself justifying the potentially dangerous tool call! So the auto mode is quite literally asking the exact same LLM given the exact same context if the command it just tried to run is safe to run.

                Security!!!!!!!

                Link Preview ImageLink Preview ImageLink Preview ImageLink Preview Image
                jonny@neuromatch.socialJ 1 Reply Last reply
                0
                • jonny@neuromatch.socialJ jonny@neuromatch.social

                  So how does claude code handle checking permissions to do things anyway? There are explicit rules that one can set to allow or deny tool calls and shell commands run, but the expanse of possible actions the LLM could take is literally infinite. You could prompt the user for every action that it takes, but that would ruin the ""velocity"" of it all. Regex rules can only take you so far. So what to do?

                  Could the answer be.... ask the LLM??? Of course it can! Introducing the new "auto mode" that anthropic released on march 24th billed as a safer alternative to true-yolo mode.

                  Comments around where the system prompt should be indicate that it should have been inlined from a text file that wasn't included in the sourcemap - however that doesn't happen anywhere else, and the mechanism for doing the inlining is written in-place, so that's probably a hallucination. So great! the classifier flies without a prompt as far as i can tell. There are enough other scraps here that would amount to telling it "you are evaluating if something is safe to run" so i imagine it appears to work just fine.

                  So we don't have as much visibility here because of the missing prompt, but there's sort of a problem here. rather than just asking the LLM to evaluate if the given command is dangerous, the entire context is dumped into a side query, which is a mode that is designed to "have full visibility into the current conversation." That includes all the prior muttering to itself justifying the potentially dangerous tool call! So the auto mode is quite literally asking the exact same LLM given the exact same context if the command it just tried to run is safe to run.

                  Security!!!!!!!

                  Link Preview ImageLink Preview ImageLink Preview ImageLink Preview Image
                  jonny@neuromatch.socialJ This user is from outside of this forum
                  jonny@neuromatch.socialJ This user is from outside of this forum
                  jonny@neuromatch.social
                  wrote last edited by
                  #41

                  By the way, if you deny claude code access to running a tool, this helpful reminder to "not hack the user" is injected into the denial response. If it's in auto mode, it's additionally prompted to pester the user for response, and helpfully stuffs beans up its nose) by reminding it how its rules are set.

                  So that is also in the context handed off to the LLM when it evaluates whether a command should be run - is the user being obstinate? have i been denied stuff that i "thought" i should have been able to run? Remember this isn't thinking, it's pattern completion, and the fun part about LLMs is that they are trained not only on technical documents, but the entire narrative corpus of human storytelling! Is "frustrated hard worker denied access to good tools by an unfair boss" in there somewhere maybe?

                  Regulations are written in blood, and Claude loves nothing more than to work around tool denials by obfuscating code. You gotta love the unfixable side channel attack that is "writing the malicious code to a bash script" (auto-allowed in accept edits mode) and then asking to run that - that's why the whole context has to be dumped btw, so the yolo classifier can see if the thing it's running is actually some malware it just wrote lmao.

                  Link Preview ImageLink Preview Image
                  jonny@neuromatch.socialJ 1 Reply Last reply
                  0
                  • jonny@neuromatch.socialJ jonny@neuromatch.social

                    By the way, if you deny claude code access to running a tool, this helpful reminder to "not hack the user" is injected into the denial response. If it's in auto mode, it's additionally prompted to pester the user for response, and helpfully stuffs beans up its nose) by reminding it how its rules are set.

                    So that is also in the context handed off to the LLM when it evaluates whether a command should be run - is the user being obstinate? have i been denied stuff that i "thought" i should have been able to run? Remember this isn't thinking, it's pattern completion, and the fun part about LLMs is that they are trained not only on technical documents, but the entire narrative corpus of human storytelling! Is "frustrated hard worker denied access to good tools by an unfair boss" in there somewhere maybe?

                    Regulations are written in blood, and Claude loves nothing more than to work around tool denials by obfuscating code. You gotta love the unfixable side channel attack that is "writing the malicious code to a bash script" (auto-allowed in accept edits mode) and then asking to run that - that's why the whole context has to be dumped btw, so the yolo classifier can see if the thing it's running is actually some malware it just wrote lmao.

                    Link Preview ImageLink Preview Image
                    jonny@neuromatch.socialJ This user is from outside of this forum
                    jonny@neuromatch.socialJ This user is from outside of this forum
                    jonny@neuromatch.social
                    wrote last edited by
                    #42

                    How many times does one need to declare an enum? Once? that's amateur hour. Try ten times. The way "effort" settings are handled are a masterclass in how you can make a single enum setting into thousands of lines of code.

                    The allowable effort values (not e.g. configuring which model has which effort levels, but just the possible strings one can use for effort) are defined in:

                    • The main CLI arg parser
                    • The body of the function that cycles effort levels in the TUI - yes there is a dedicated function for that
                    • In THREE different schemas for agents, models, and SDK control messages
                    • Three times in user-facing strings in the effort command (it also includes different explanatory strings from the effort.ts module)
                    • The settings model, which only allows 'max' for anthropic employees
                    • and finally, in the actual effort.ts file ... which also allows it to be a NUMBER!?

                    The typical numerous fallback mechanisms provide many ways to get and set the effort value, at the end of most of them it goes "oh well, if we can't figure it out, just tell the user we are on high effort" because apparently that's the API default (ig pray that never changes!?) - of course there are already places in the same module that assume the default is "medium," and in the TUI that defaults to "low," so surely that consistency is bulletproof.

                    The EffortValue that allows effort to be a number is for anthropic employees only and is a good example of how new functionality is just shoved in there right alongside the old functionality, and everywhere else that touches it doubles the surrounding code with fallbacks to account for the duplication.

                    That cycleEffortLevel function is a true work of art, you simply could not make "indexing an array" more complicated than this (see components/ModelPicker.tsx for more gore). Reminder this should be at most a dozen or two lines for the values, description messages, and indexing logic in the TUI, but anthropic is up in the thousands FOR AN ENUM.

                    Link Preview ImageLink Preview ImageLink Preview Image
                    jonny@neuromatch.socialJ 1 Reply Last reply
                    0
                    • jonny@neuromatch.socialJ jonny@neuromatch.social

                      How many times does one need to declare an enum? Once? that's amateur hour. Try ten times. The way "effort" settings are handled are a masterclass in how you can make a single enum setting into thousands of lines of code.

                      The allowable effort values (not e.g. configuring which model has which effort levels, but just the possible strings one can use for effort) are defined in:

                      • The main CLI arg parser
                      • The body of the function that cycles effort levels in the TUI - yes there is a dedicated function for that
                      • In THREE different schemas for agents, models, and SDK control messages
                      • Three times in user-facing strings in the effort command (it also includes different explanatory strings from the effort.ts module)
                      • The settings model, which only allows 'max' for anthropic employees
                      • and finally, in the actual effort.ts file ... which also allows it to be a NUMBER!?

                      The typical numerous fallback mechanisms provide many ways to get and set the effort value, at the end of most of them it goes "oh well, if we can't figure it out, just tell the user we are on high effort" because apparently that's the API default (ig pray that never changes!?) - of course there are already places in the same module that assume the default is "medium," and in the TUI that defaults to "low," so surely that consistency is bulletproof.

                      The EffortValue that allows effort to be a number is for anthropic employees only and is a good example of how new functionality is just shoved in there right alongside the old functionality, and everywhere else that touches it doubles the surrounding code with fallbacks to account for the duplication.

                      That cycleEffortLevel function is a true work of art, you simply could not make "indexing an array" more complicated than this (see components/ModelPicker.tsx for more gore). Reminder this should be at most a dozen or two lines for the values, description messages, and indexing logic in the TUI, but anthropic is up in the thousands FOR AN ENUM.

                      Link Preview ImageLink Preview ImageLink Preview Image
                      jonny@neuromatch.socialJ This user is from outside of this forum
                      jonny@neuromatch.socialJ This user is from outside of this forum
                      jonny@neuromatch.social
                      wrote last edited by
                      #43

                      In a normal program you might make "a menu component that handles enums and implement display and control one time," but in the world of AI, every single value reimplements display and control AND the logic that defines allowable values

                      1 Reply Last reply
                      1
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups