Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Claude code source "leaks" in a mapfile

Claude code source "leaks" in a mapfile

Scheduled Pinned Locked Moved Uncategorized
43 Posts 4 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • jonny@neuromatch.socialJ jonny@neuromatch.social

    and what if i told you that if it passes a page range to its pdf reader, it first extracts those pages to separate images and then calls this function in a loop on each of the pages. so you have the privilege of compressing n_pages images n_pages * 13 times.

    this function is used 13 times: in the file reader, in the mcp result handler, in the bash tool, and in the clipboard handler - each of which has their entire own surrounding image handling routines that are each hundreds of lines of similar but still very different fallback code to do exactly the same thing.

    so that's where all the five hundred thousand lines come from - fallback conditions and then more fallback conditions to compensate for the variable output of all the other fallback conditions. thirteen butts pooping, back and forth, forever.

    Link Preview Image
    jonny@neuromatch.socialJ This user is from outside of this forum
    jonny@neuromatch.socialJ This user is from outside of this forum
    jonny@neuromatch.social
    wrote last edited by
    #27

    there is a callback feature "file read listeners" which is only called if the file type is a text document, gated for anthropic employees only, such that whenever a text file is read (any part of any text file, which often happens in a rapid series with subranges when it does 'explore' mode, rather than just like grepping), another subagent running sonnet is spun off to update a "magic doc" markdown file that summarizes the file that's read - that's one "magic doc" per file, not one magic doc.

    I have yet to get into the tool/agent graph situation in earnest, but keep in mind that this is an entirely single-use and completely different means of spawning a graph of subagents off a given tool call than is used anywhere else.

    Spoiler alert for what i'm gonna check out next is that claude code has no fucking tool calling execution model it just calls whatever the fuck it wants wherever the fuck it wants. Tools are or less a convenient fiction. I have only read one completely (file read) and skimmed a dozen more but they essentially share nothing in common except for a humongous list of often-single-use params and the return type of "any object with a single key and whatever else"

    i'm in hell. this is hell.

    jonny@neuromatch.socialJ 1 Reply Last reply
    0
    • jonny@neuromatch.socialJ jonny@neuromatch.social

      there is a callback feature "file read listeners" which is only called if the file type is a text document, gated for anthropic employees only, such that whenever a text file is read (any part of any text file, which often happens in a rapid series with subranges when it does 'explore' mode, rather than just like grepping), another subagent running sonnet is spun off to update a "magic doc" markdown file that summarizes the file that's read - that's one "magic doc" per file, not one magic doc.

      I have yet to get into the tool/agent graph situation in earnest, but keep in mind that this is an entirely single-use and completely different means of spawning a graph of subagents off a given tool call than is used anywhere else.

      Spoiler alert for what i'm gonna check out next is that claude code has no fucking tool calling execution model it just calls whatever the fuck it wants wherever the fuck it wants. Tools are or less a convenient fiction. I have only read one completely (file read) and skimmed a dozen more but they essentially share nothing in common except for a humongous list of often-single-use params and the return type of "any object with a single key and whatever else"

      i'm in hell. this is hell.

      jonny@neuromatch.socialJ This user is from outside of this forum
      jonny@neuromatch.socialJ This user is from outside of this forum
      jonny@neuromatch.social
      wrote last edited by
      #28

      i have been writing a graph processing library for about a year now and if i was a fucking AI grifter here is where i would plug it as like "actually a graph processor library" and "could do all of what claude code does without fucking being the worst nightmare on ice money can buy."

      I say that not as self promo, but as a way of saying how in the FUCK do you FUCK UP graph processing this badly. these people make like tens of times more money than i do but their work is just tamping down a volley of dessicated backpacking poops into muskets and then free firing it into the fucking economy

      jonny@neuromatch.socialJ 1 Reply Last reply
      0
      • jonny@neuromatch.socialJ jonny@neuromatch.social

        i have been writing a graph processing library for about a year now and if i was a fucking AI grifter here is where i would plug it as like "actually a graph processor library" and "could do all of what claude code does without fucking being the worst nightmare on ice money can buy."

        I say that not as self promo, but as a way of saying how in the FUCK do you FUCK UP graph processing this badly. these people make like tens of times more money than i do but their work is just tamping down a volley of dessicated backpacking poops into muskets and then free firing it into the fucking economy

        jonny@neuromatch.socialJ This user is from outside of this forum
        jonny@neuromatch.socialJ This user is from outside of this forum
        jonny@neuromatch.social
        wrote last edited by
        #29

        you can TELL that this technology REALLY WORKS by how the people that made it and presumably know how to use it the best out of everyone CANT EVEN USE IT TO EDIT A FUCKING FILE RELIABLY and have to resort to multiple stern allcaps reminders to the robot that "you must not change the fucking header metadata you scoundrel" which for the rest of ALL OF COMPUTING is not even an afterthought because literally all it requires is "split the first line off and don't change that one" because ALL OF THE REST OF COMPUTING can make use of the power of INTEGERS.

        Link Preview ImageLink Preview Image
        jonny@neuromatch.socialJ 1 Reply Last reply
        0
        • jonny@neuromatch.socialJ jonny@neuromatch.social

          you can TELL that this technology REALLY WORKS by how the people that made it and presumably know how to use it the best out of everyone CANT EVEN USE IT TO EDIT A FUCKING FILE RELIABLY and have to resort to multiple stern allcaps reminders to the robot that "you must not change the fucking header metadata you scoundrel" which for the rest of ALL OF COMPUTING is not even an afterthought because literally all it requires is "split the first line off and don't change that one" because ALL OF THE REST OF COMPUTING can make use of the power of INTEGERS.

          Link Preview ImageLink Preview Image
          jonny@neuromatch.socialJ This user is from outside of this forum
          jonny@neuromatch.socialJ This user is from outside of this forum
          jonny@neuromatch.social
          wrote last edited by
          #30

          alrighty so that's one of 43 tools read, the tools directory being 38494 source lines out of 390592 source lines, 513221 total lines. I need to go to bed. This is the most fabulously, flamboyantly bad code i have ever encountered.

          Worth noting I was reading the file reading tool because i thought it would be the simplest possible thing one could do because it basically shouldn't be doing anything except preparing and sending strings or bytes to the backend.

          I expected to get some sense of "ok what is the format of the data as it's passed around within the program, surely text strings are a basic unit of currency. No dice. Fewer than no dice. Negative dice somehow.

          jonny@neuromatch.socialJ 1 Reply Last reply
          0
          • jonny@neuromatch.socialJ jonny@neuromatch.social

            alrighty so that's one of 43 tools read, the tools directory being 38494 source lines out of 390592 source lines, 513221 total lines. I need to go to bed. This is the most fabulously, flamboyantly bad code i have ever encountered.

            Worth noting I was reading the file reading tool because i thought it would be the simplest possible thing one could do because it basically shouldn't be doing anything except preparing and sending strings or bytes to the backend.

            I expected to get some sense of "ok what is the format of the data as it's passed around within the program, surely text strings are a basic unit of currency. No dice. Fewer than no dice. Negative dice somehow.

            jonny@neuromatch.socialJ This user is from outside of this forum
            jonny@neuromatch.socialJ This user is from outside of this forum
            jonny@neuromatch.social
            wrote last edited by
            #31

            next puzzle: why in the fuck are some of the tools actually two tools for entering and exiting being in the tool state. none of the other tools are like that. one is simply in the tool state by calling the tool. Plan mode is also an agent. Plan Agent. and Agent is also a tool. Agent Tool. Tools can be agents and agents can be tools. Tools can spawn agents (but they don't need to call the agent tool) and agents can call tools (however there is no tool agent). What is going on. What is anything.

            Link Preview Image
            jonny@neuromatch.socialJ 1 Reply Last reply
            0
            • jonny@neuromatch.socialJ jonny@neuromatch.social

              next puzzle: why in the fuck are some of the tools actually two tools for entering and exiting being in the tool state. none of the other tools are like that. one is simply in the tool state by calling the tool. Plan mode is also an agent. Plan Agent. and Agent is also a tool. Agent Tool. Tools can be agents and agents can be tools. Tools can spawn agents (but they don't need to call the agent tool) and agents can call tools (however there is no tool agent). What is going on. What is anything.

              Link Preview Image
              jonny@neuromatch.socialJ This user is from outside of this forum
              jonny@neuromatch.socialJ This user is from outside of this forum
              jonny@neuromatch.social
              wrote last edited by
              #32

              "the emperor is not only naked, he's smooth like a ken doll down there and i'm pretty sure that's just a mannequin with a colony of rats living inside it anyway"

              jonny@neuromatch.socialJ 1 Reply Last reply
              0
              • jonny@neuromatch.socialJ jonny@neuromatch.social

                "the emperor is not only naked, he's smooth like a ken doll down there and i'm pretty sure that's just a mannequin with a colony of rats living inside it anyway"

                jonny@neuromatch.socialJ This user is from outside of this forum
                jonny@neuromatch.socialJ This user is from outside of this forum
                jonny@neuromatch.social
                wrote last edited by
                #33

                I seriously need to work on my actual job today but i am giving myself 15 minutes to peek at the agent tool prompts as a treat.

                "regulations are written in blood" seems like too dramatic of a way to phrase it, but these system prompts are very revealing about the intrinsically busted nature of using these tools for anything deterministic (read: anything you actually want to happen). Each guard in the prompt presumably refers to something that has happened before, but also, since the prompts actually don't work to prevent the thing they are describing, they are also documentation of bugs that are almost certain to happen again. Many of the prompt guards form pairs with attempted code mitigations (or, they would be pairs if the code was written with any amount of sense, it's really like... polycules...), so they are useful to guide what kind of fucked up shit you should be looking for.

                so this is part of the prompt for the "agent tool" that launches forked agents (that receive the parent context, "subagents" don't). The purpose of the forked agent is to do some additional tool calls and get some summary for a small subproblem within the main context. Apparently it is difficult to make this actually happen though, as the parent LLM likes to launch the forked agent and just hallucinate a response as if the forked agent had already completed.

                Link Preview Image
                jonny@neuromatch.socialJ bri7@social.treehouse.systemsB 2 Replies Last reply
                0
                • jonny@neuromatch.socialJ jonny@neuromatch.social

                  I seriously need to work on my actual job today but i am giving myself 15 minutes to peek at the agent tool prompts as a treat.

                  "regulations are written in blood" seems like too dramatic of a way to phrase it, but these system prompts are very revealing about the intrinsically busted nature of using these tools for anything deterministic (read: anything you actually want to happen). Each guard in the prompt presumably refers to something that has happened before, but also, since the prompts actually don't work to prevent the thing they are describing, they are also documentation of bugs that are almost certain to happen again. Many of the prompt guards form pairs with attempted code mitigations (or, they would be pairs if the code was written with any amount of sense, it's really like... polycules...), so they are useful to guide what kind of fucked up shit you should be looking for.

                  so this is part of the prompt for the "agent tool" that launches forked agents (that receive the parent context, "subagents" don't). The purpose of the forked agent is to do some additional tool calls and get some summary for a small subproblem within the main context. Apparently it is difficult to make this actually happen though, as the parent LLM likes to launch the forked agent and just hallucinate a response as if the forked agent had already completed.

                  Link Preview Image
                  jonny@neuromatch.socialJ This user is from outside of this forum
                  jonny@neuromatch.socialJ This user is from outside of this forum
                  jonny@neuromatch.social
                  wrote last edited by
                  #34

                  The prompt strings have an odd narrative/narrator structure. It sort of reminds me of Bakhtin's discussion of polyphony and narrator in Dostoevsky - there is no omniscient narrator, no author-constructed reality. narration is always embedded within the voice and subjectivity of the character. this is also literally true since the LLM is writing the code and the prompts that are then used to write code and prompts at runtime.

                  They also read a bit like a Philip K Dick story, paranoid and suspicious, constantly uncertain about the status of one's own and others identities.

                  Link Preview Image
                  jonny@neuromatch.socialJ 1 Reply Last reply
                  1
                  0
                  • jonny@neuromatch.socialJ jonny@neuromatch.social

                    The prompt strings have an odd narrative/narrator structure. It sort of reminds me of Bakhtin's discussion of polyphony and narrator in Dostoevsky - there is no omniscient narrator, no author-constructed reality. narration is always embedded within the voice and subjectivity of the character. this is also literally true since the LLM is writing the code and the prompts that are then used to write code and prompts at runtime.

                    They also read a bit like a Philip K Dick story, paranoid and suspicious, constantly uncertain about the status of one's own and others identities.

                    Link Preview Image
                    jonny@neuromatch.socialJ This user is from outside of this forum
                    jonny@neuromatch.socialJ This user is from outside of this forum
                    jonny@neuromatch.social
                    wrote last edited by
                    #35

                    oh. hm. that seems bad. "workers aren't affected by the parent's tool restrictions."

                    It's hard to tell what's going on here because claude code doesn't really use typescript well - many of the most important types are dynamically computed from any, and most of the time when types do exist many of their fields are nullable and the calling code has elaborate fallback conditions to compensate. all of which sort of defeats the purpose of ts.

                    So i need to trace out like a dozen steps to see how the permission mode gets populated. But this comment is... concerning...

                    Link Preview Image
                    jonny@neuromatch.socialJ 1 Reply Last reply
                    0
                    • jonny@neuromatch.socialJ jonny@neuromatch.social

                      oh. hm. that seems bad. "workers aren't affected by the parent's tool restrictions."

                      It's hard to tell what's going on here because claude code doesn't really use typescript well - many of the most important types are dynamically computed from any, and most of the time when types do exist many of their fields are nullable and the calling code has elaborate fallback conditions to compensate. all of which sort of defeats the purpose of ts.

                      So i need to trace out like a dozen steps to see how the permission mode gets populated. But this comment is... concerning...

                      Link Preview Image
                      jonny@neuromatch.socialJ This user is from outside of this forum
                      jonny@neuromatch.socialJ This user is from outside of this forum
                      jonny@neuromatch.social
                      wrote last edited by
                      #36

                      ok over my 15 minute allotment by an hour. brb

                      jonny@neuromatch.socialJ 1 Reply Last reply
                      0
                      • jonny@neuromatch.socialJ jonny@neuromatch.social

                        I seriously need to work on my actual job today but i am giving myself 15 minutes to peek at the agent tool prompts as a treat.

                        "regulations are written in blood" seems like too dramatic of a way to phrase it, but these system prompts are very revealing about the intrinsically busted nature of using these tools for anything deterministic (read: anything you actually want to happen). Each guard in the prompt presumably refers to something that has happened before, but also, since the prompts actually don't work to prevent the thing they are describing, they are also documentation of bugs that are almost certain to happen again. Many of the prompt guards form pairs with attempted code mitigations (or, they would be pairs if the code was written with any amount of sense, it's really like... polycules...), so they are useful to guide what kind of fucked up shit you should be looking for.

                        so this is part of the prompt for the "agent tool" that launches forked agents (that receive the parent context, "subagents" don't). The purpose of the forked agent is to do some additional tool calls and get some summary for a small subproblem within the main context. Apparently it is difficult to make this actually happen though, as the parent LLM likes to launch the forked agent and just hallucinate a response as if the forked agent had already completed.

                        Link Preview Image
                        bri7@social.treehouse.systemsB This user is from outside of this forum
                        bri7@social.treehouse.systemsB This user is from outside of this forum
                        bri7@social.treehouse.systems
                        wrote last edited by
                        #37

                        @jonny if someone were to take seriously the task of archecting this, you’d want a framework that doesn’t use prompts for this right? something that treats the LLM output more like untrusted stochastic guesses at solutions, where these prompt rules are written as a test instead of a prompt

                        jonny@neuromatch.socialJ 1 Reply Last reply
                        0
                        • jonny@neuromatch.socialJ jonny@neuromatch.social

                          MAKE NO MISTAKES LMAO

                          beckermatic@pleroma.arielbecker.comB This user is from outside of this forum
                          beckermatic@pleroma.arielbecker.comB This user is from outside of this forum
                          beckermatic@pleroma.arielbecker.com
                          wrote last edited by
                          #38

                          and other OWASP top 10 vulnerabilities.

                          So... If there's a slightly obscure vuln, go ahead. Just fine! 🤣 💀

                          1 Reply Last reply
                          1
                          0
                          • bri7@social.treehouse.systemsB bri7@social.treehouse.systems

                            @jonny if someone were to take seriously the task of archecting this, you’d want a framework that doesn’t use prompts for this right? something that treats the LLM output more like untrusted stochastic guesses at solutions, where these prompt rules are written as a test instead of a prompt

                            jonny@neuromatch.socialJ This user is from outside of this forum
                            jonny@neuromatch.socialJ This user is from outside of this forum
                            jonny@neuromatch.social
                            wrote last edited by
                            #39

                            @bri7 the problem, as is increasingly clear to me reading this code, is that introducing the LLM anywhere is like an acid that corrodes everything it touches. there is no good way to draw any barrier between LLM and not LLM. None of its actions are deterministic or even usually possible to evaluate, and the only surface of input it has is text. since a client/server app can't expose the internal activation tensors or whatever you might want to do to have some testable thing to operate on in code (god knows what that would look like, i doubt it would be possible either, "please construct the hyperplane through this billion-dimensional space that divides good from evil") everything has to be made of text. the person behind the keyboard is the only stopping condition and it's when they get tired of typing stuff into the prompt box or run out of money.

                            1 Reply Last reply
                            1
                            0
                            • jonny@neuromatch.socialJ jonny@neuromatch.social

                              ok over my 15 minute allotment by an hour. brb

                              jonny@neuromatch.socialJ This user is from outside of this forum
                              jonny@neuromatch.socialJ This user is from outside of this forum
                              jonny@neuromatch.social
                              wrote last edited by
                              #40

                              So how does claude code handle checking permissions to do things anyway? There are explicit rules that one can set to allow or deny tool calls and shell commands run, but the expanse of possible actions the LLM could take is literally infinite. You could prompt the user for every action that it takes, but that would ruin the ""velocity"" of it all. Regex rules can only take you so far. So what to do?

                              Could the answer be.... ask the LLM??? Of course it can! Introducing the new "auto mode" that anthropic released on march 24th billed as a safer alternative to true-yolo mode.

                              Comments around where the system prompt should be indicate that it should have been inlined from a text file that wasn't included in the sourcemap - however that doesn't happen anywhere else, and the mechanism for doing the inlining is written in-place, so that's probably a hallucination. So great! the classifier flies without a prompt as far as i can tell. There are enough other scraps here that would amount to telling it "you are evaluating if something is safe to run" so i imagine it appears to work just fine.

                              So we don't have as much visibility here because of the missing prompt, but there's sort of a problem here. rather than just asking the LLM to evaluate if the given command is dangerous, the entire context is dumped into a side query, which is a mode that is designed to "have full visibility into the current conversation." That includes all the prior muttering to itself justifying the potentially dangerous tool call! So the auto mode is quite literally asking the exact same LLM given the exact same context if the command it just tried to run is safe to run.

                              Security!!!!!!!

                              Link Preview ImageLink Preview ImageLink Preview ImageLink Preview Image
                              jonny@neuromatch.socialJ 1 Reply Last reply
                              0
                              • jonny@neuromatch.socialJ jonny@neuromatch.social

                                So how does claude code handle checking permissions to do things anyway? There are explicit rules that one can set to allow or deny tool calls and shell commands run, but the expanse of possible actions the LLM could take is literally infinite. You could prompt the user for every action that it takes, but that would ruin the ""velocity"" of it all. Regex rules can only take you so far. So what to do?

                                Could the answer be.... ask the LLM??? Of course it can! Introducing the new "auto mode" that anthropic released on march 24th billed as a safer alternative to true-yolo mode.

                                Comments around where the system prompt should be indicate that it should have been inlined from a text file that wasn't included in the sourcemap - however that doesn't happen anywhere else, and the mechanism for doing the inlining is written in-place, so that's probably a hallucination. So great! the classifier flies without a prompt as far as i can tell. There are enough other scraps here that would amount to telling it "you are evaluating if something is safe to run" so i imagine it appears to work just fine.

                                So we don't have as much visibility here because of the missing prompt, but there's sort of a problem here. rather than just asking the LLM to evaluate if the given command is dangerous, the entire context is dumped into a side query, which is a mode that is designed to "have full visibility into the current conversation." That includes all the prior muttering to itself justifying the potentially dangerous tool call! So the auto mode is quite literally asking the exact same LLM given the exact same context if the command it just tried to run is safe to run.

                                Security!!!!!!!

                                Link Preview ImageLink Preview ImageLink Preview ImageLink Preview Image
                                jonny@neuromatch.socialJ This user is from outside of this forum
                                jonny@neuromatch.socialJ This user is from outside of this forum
                                jonny@neuromatch.social
                                wrote last edited by
                                #41

                                By the way, if you deny claude code access to running a tool, this helpful reminder to "not hack the user" is injected into the denial response. If it's in auto mode, it's additionally prompted to pester the user for response, and helpfully stuffs beans up its nose) by reminding it how its rules are set.

                                So that is also in the context handed off to the LLM when it evaluates whether a command should be run - is the user being obstinate? have i been denied stuff that i "thought" i should have been able to run? Remember this isn't thinking, it's pattern completion, and the fun part about LLMs is that they are trained not only on technical documents, but the entire narrative corpus of human storytelling! Is "frustrated hard worker denied access to good tools by an unfair boss" in there somewhere maybe?

                                Regulations are written in blood, and Claude loves nothing more than to work around tool denials by obfuscating code. You gotta love the unfixable side channel attack that is "writing the malicious code to a bash script" (auto-allowed in accept edits mode) and then asking to run that - that's why the whole context has to be dumped btw, so the yolo classifier can see if the thing it's running is actually some malware it just wrote lmao.

                                Link Preview ImageLink Preview Image
                                jonny@neuromatch.socialJ 1 Reply Last reply
                                0
                                • jonny@neuromatch.socialJ jonny@neuromatch.social

                                  By the way, if you deny claude code access to running a tool, this helpful reminder to "not hack the user" is injected into the denial response. If it's in auto mode, it's additionally prompted to pester the user for response, and helpfully stuffs beans up its nose) by reminding it how its rules are set.

                                  So that is also in the context handed off to the LLM when it evaluates whether a command should be run - is the user being obstinate? have i been denied stuff that i "thought" i should have been able to run? Remember this isn't thinking, it's pattern completion, and the fun part about LLMs is that they are trained not only on technical documents, but the entire narrative corpus of human storytelling! Is "frustrated hard worker denied access to good tools by an unfair boss" in there somewhere maybe?

                                  Regulations are written in blood, and Claude loves nothing more than to work around tool denials by obfuscating code. You gotta love the unfixable side channel attack that is "writing the malicious code to a bash script" (auto-allowed in accept edits mode) and then asking to run that - that's why the whole context has to be dumped btw, so the yolo classifier can see if the thing it's running is actually some malware it just wrote lmao.

                                  Link Preview ImageLink Preview Image
                                  jonny@neuromatch.socialJ This user is from outside of this forum
                                  jonny@neuromatch.socialJ This user is from outside of this forum
                                  jonny@neuromatch.social
                                  wrote last edited by
                                  #42

                                  How many times does one need to declare an enum? Once? that's amateur hour. Try ten times. The way "effort" settings are handled are a masterclass in how you can make a single enum setting into thousands of lines of code.

                                  The allowable effort values (not e.g. configuring which model has which effort levels, but just the possible strings one can use for effort) are defined in:

                                  • The main CLI arg parser
                                  • The body of the function that cycles effort levels in the TUI - yes there is a dedicated function for that
                                  • In THREE different schemas for agents, models, and SDK control messages
                                  • Three times in user-facing strings in the effort command (it also includes different explanatory strings from the effort.ts module)
                                  • The settings model, which only allows 'max' for anthropic employees
                                  • and finally, in the actual effort.ts file ... which also allows it to be a NUMBER!?

                                  The typical numerous fallback mechanisms provide many ways to get and set the effort value, at the end of most of them it goes "oh well, if we can't figure it out, just tell the user we are on high effort" because apparently that's the API default (ig pray that never changes!?) - of course there are already places in the same module that assume the default is "medium," and in the TUI that defaults to "low," so surely that consistency is bulletproof.

                                  The EffortValue that allows effort to be a number is for anthropic employees only and is a good example of how new functionality is just shoved in there right alongside the old functionality, and everywhere else that touches it doubles the surrounding code with fallbacks to account for the duplication.

                                  That cycleEffortLevel function is a true work of art, you simply could not make "indexing an array" more complicated than this (see components/ModelPicker.tsx for more gore). Reminder this should be at most a dozen or two lines for the values, description messages, and indexing logic in the TUI, but anthropic is up in the thousands FOR AN ENUM.

                                  Link Preview ImageLink Preview ImageLink Preview Image
                                  jonny@neuromatch.socialJ 1 Reply Last reply
                                  0
                                  • jonny@neuromatch.socialJ jonny@neuromatch.social

                                    How many times does one need to declare an enum? Once? that's amateur hour. Try ten times. The way "effort" settings are handled are a masterclass in how you can make a single enum setting into thousands of lines of code.

                                    The allowable effort values (not e.g. configuring which model has which effort levels, but just the possible strings one can use for effort) are defined in:

                                    • The main CLI arg parser
                                    • The body of the function that cycles effort levels in the TUI - yes there is a dedicated function for that
                                    • In THREE different schemas for agents, models, and SDK control messages
                                    • Three times in user-facing strings in the effort command (it also includes different explanatory strings from the effort.ts module)
                                    • The settings model, which only allows 'max' for anthropic employees
                                    • and finally, in the actual effort.ts file ... which also allows it to be a NUMBER!?

                                    The typical numerous fallback mechanisms provide many ways to get and set the effort value, at the end of most of them it goes "oh well, if we can't figure it out, just tell the user we are on high effort" because apparently that's the API default (ig pray that never changes!?) - of course there are already places in the same module that assume the default is "medium," and in the TUI that defaults to "low," so surely that consistency is bulletproof.

                                    The EffortValue that allows effort to be a number is for anthropic employees only and is a good example of how new functionality is just shoved in there right alongside the old functionality, and everywhere else that touches it doubles the surrounding code with fallbacks to account for the duplication.

                                    That cycleEffortLevel function is a true work of art, you simply could not make "indexing an array" more complicated than this (see components/ModelPicker.tsx for more gore). Reminder this should be at most a dozen or two lines for the values, description messages, and indexing logic in the TUI, but anthropic is up in the thousands FOR AN ENUM.

                                    Link Preview ImageLink Preview ImageLink Preview Image
                                    jonny@neuromatch.socialJ This user is from outside of this forum
                                    jonny@neuromatch.socialJ This user is from outside of this forum
                                    jonny@neuromatch.social
                                    wrote last edited by
                                    #43

                                    In a normal program you might make "a menu component that handles enums and implement display and control one time," but in the world of AI, every single value reimplements display and control AND the logic that defines allowable values

                                    1 Reply Last reply
                                    1
                                    0
                                    Reply
                                    • Reply as topic
                                    Log in to reply
                                    • Oldest to Newest
                                    • Newest to Oldest
                                    • Most Votes


                                    • Login

                                    • Login or register to search.
                                    • First post
                                      Last post
                                    0
                                    • Categories
                                    • Recent
                                    • Tags
                                    • Popular
                                    • World
                                    • Users
                                    • Groups