Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Psst, want to see a funny GitHub issue?

Psst, want to see a funny GitHub issue?

Scheduled Pinned Locked Moved Uncategorized
15 Posts 10 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • astraleureka@social.treehouse.systemsA astraleureka@social.treehouse.systems

    @wren6991 this absurdity is on-par with the "claude re-emits vaguely json-shaped output repeatedly until the linter says it's valid" discovery from the recent source leak

    wren6991@types.plW This user is from outside of this forum
    wren6991@types.plW This user is from outside of this forum
    wren6991@types.pl
    wrote last edited by
    #5

    @astraleureka I don't know what Anthropic do but llama-cpp (open-source inference) apparently does masked decoding for tool calls. It recognises a magic token indicating the start of a tool call and from that point it forces probability to 0 for tokens that don't match an FSM for JSON syntax + tool call schema. This is done at inference level and might not be visible in the Claude leak, which afaik was just the harness.

    So it's not quite as dumb as I made it sound because the LLM is constrained to only produce syntactically and schematically correct JSON during tool calls. It's still funny that it just... types the JSON though

    wren6991@types.plW 1 Reply Last reply
    0
    • wren6991@types.plW wren6991@types.pl

      @astraleureka I don't know what Anthropic do but llama-cpp (open-source inference) apparently does masked decoding for tool calls. It recognises a magic token indicating the start of a tool call and from that point it forces probability to 0 for tokens that don't match an FSM for JSON syntax + tool call schema. This is done at inference level and might not be visible in the Claude leak, which afaik was just the harness.

      So it's not quite as dumb as I made it sound because the LLM is constrained to only produce syntactically and schematically correct JSON during tool calls. It's still funny that it just... types the JSON though

      wren6991@types.plW This user is from outside of this forum
      wren6991@types.plW This user is from outside of this forum
      wren6991@types.pl
      wrote last edited by
      #6

      @astraleureka There's a little bit of info here: https://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md

      There is some plumbing to make this match whatever the model is post-trained to emit for tool calls. No idea where that is. The whole file format situation is absolutely fucked in general

      astraleureka@social.treehouse.systemsA 1 Reply Last reply
      0
      • wren6991@types.plW wren6991@types.pl

        @astraleureka There's a little bit of info here: https://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md

        There is some plumbing to make this match whatever the model is post-trained to emit for tool calls. No idea where that is. The whole file format situation is absolutely fucked in general

        astraleureka@social.treehouse.systemsA This user is from outside of this forum
        astraleureka@social.treehouse.systemsA This user is from outside of this forum
        astraleureka@social.treehouse.systems
        wrote last edited by
        #7

        @wren6991 frankly, this is quite as dumb as you made it sound. "in-band signalling is bad" was a lesson learned by telecom developers more than 40 years ago before the advent of Modern Development as we know it. developers of all walks have understood this very basic concept after much blood, sweat and tears, and yet here we are with ANTHROPIC_MAGIC_STRING_TRIGGER_REFUSAL_1FAEFB6177B4672DEE07F9D3AFC62588CCD2631EDCF22E8CCC1FB35B501C9C86 and second-person prompts begging the model to Please Don't Do The Dangerous Thing as if it was self-aware

        wren6991@types.plW 1 Reply Last reply
        0
        • wren6991@types.plW wren6991@types.pl

          If you have heard the buzzword "agentic AI" but avoided finding out what it meant until now:

          1. Someone figured out an LLM can do JSON RPCs by typing out the JSON token by token.

          2. The LLM is run in a harness that regexes out the JSON from its output and executes the RPC.

          3. The response is catted into the LLM's context window, also in the form of JSON that the LLM just reads.

          4. People connect these harnesses to system shells on their dev machines.

          5. Fast forward, this is a trillion-dollar industry held together by markdown files asking the LLM to please not curlbash from the internet.

          viss@mastodon.socialV This user is from outside of this forum
          viss@mastodon.socialV This user is from outside of this forum
          viss@mastodon.social
          wrote last edited by
          #8

          @wren6991 and they want it to fly fighterjets and guide bombs too. wheeee

          1 Reply Last reply
          0
          • astraleureka@social.treehouse.systemsA astraleureka@social.treehouse.systems

            @wren6991 frankly, this is quite as dumb as you made it sound. "in-band signalling is bad" was a lesson learned by telecom developers more than 40 years ago before the advent of Modern Development as we know it. developers of all walks have understood this very basic concept after much blood, sweat and tears, and yet here we are with ANTHROPIC_MAGIC_STRING_TRIGGER_REFUSAL_1FAEFB6177B4672DEE07F9D3AFC62588CCD2631EDCF22E8CCC1FB35B501C9C86 and second-person prompts begging the model to Please Don't Do The Dangerous Thing as if it was self-aware

            wren6991@types.plW This user is from outside of this forum
            wren6991@types.plW This user is from outside of this forum
            wren6991@types.pl
            wrote last edited by
            #9

            @astraleureka The analogy is messy because the tokens the model emits aren't the same thing as the character strings they're converted to/from. Like the <|think|> tag for initiating chain-of-thought is a single token that exists for that purpose, and is not the same as the multiple tokens that would spell it out character-by-character. That <|think|> token is out-of-band in the same way as the comma and control symbols in 8b10b are out-of-band. Tool calls are the same.

            The other problem you pointed out is probably the bigger one which is we took Turing machines, made the tape append-only, added associative lookups on the tape and poured the entire internet into them until they have anxiety, and the fact they appear to follow natural-language instructions most of the time is a coincidence. Having out-of-band control symbols is nice but there's no way to actually know or control when they're emitted.

            1 Reply Last reply
            0
            • wren6991@types.plW wren6991@types.pl

              Psst, want to see a funny GitHub issue? https://github.com/anomalyco/opencode/issues/18100

              f4grx@chaos.socialF This user is from outside of this forum
              f4grx@chaos.socialF This user is from outside of this forum
              f4grx@chaos.social
              wrote last edited by
              #10

              @wren6991 I am realizing that this issue was ALSO written by a llm and I feel dirty for reading it entirely.

              1 Reply Last reply
              0
              • wren6991@types.plW wren6991@types.pl

                If you have heard the buzzword "agentic AI" but avoided finding out what it meant until now:

                1. Someone figured out an LLM can do JSON RPCs by typing out the JSON token by token.

                2. The LLM is run in a harness that regexes out the JSON from its output and executes the RPC.

                3. The response is catted into the LLM's context window, also in the form of JSON that the LLM just reads.

                4. People connect these harnesses to system shells on their dev machines.

                5. Fast forward, this is a trillion-dollar industry held together by markdown files asking the LLM to please not curlbash from the internet.

                lritter@mastodon.gamedev.placeL This user is from outside of this forum
                lritter@mastodon.gamedev.placeL This user is from outside of this forum
                lritter@mastodon.gamedev.place
                wrote last edited by
                #11

                @wren6991 yup that is precisely how it works. been examining this with gemma 4 the past week. i put the control target in a docker image where it has root; it's useful for user testing, but i feel silly trying to make it do anything else.

                1 Reply Last reply
                0
                • wren6991@types.plW wren6991@types.pl

                  Psst, want to see a funny GitHub issue? https://github.com/anomalyco/opencode/issues/18100

                  floe@hci.socialF This user is from outside of this forum
                  floe@hci.socialF This user is from outside of this forum
                  floe@hci.social
                  wrote last edited by
                  #12

                  @wren6991

                  1 Reply Last reply
                  0
                  • wren6991@types.plW wren6991@types.pl

                    Psst, want to see a funny GitHub issue? https://github.com/anomalyco/opencode/issues/18100

                    rich@mastodon.gamedev.placeR This user is from outside of this forum
                    rich@mastodon.gamedev.placeR This user is from outside of this forum
                    rich@mastodon.gamedev.place
                    wrote last edited by
                    #13

                    @wren6991 "turd rules all the way down..."

                    1 Reply Last reply
                    0
                    • wren6991@types.plW wren6991@types.pl

                      If you have heard the buzzword "agentic AI" but avoided finding out what it meant until now:

                      1. Someone figured out an LLM can do JSON RPCs by typing out the JSON token by token.

                      2. The LLM is run in a harness that regexes out the JSON from its output and executes the RPC.

                      3. The response is catted into the LLM's context window, also in the form of JSON that the LLM just reads.

                      4. People connect these harnesses to system shells on their dev machines.

                      5. Fast forward, this is a trillion-dollar industry held together by markdown files asking the LLM to please not curlbash from the internet.

                      mspcommentary@mastodon.onlineM This user is from outside of this forum
                      mspcommentary@mastodon.onlineM This user is from outside of this forum
                      mspcommentary@mastodon.online
                      wrote last edited by
                      #14

                      @wren6991 it has just occurred to me that, unlike JSON, natural language isn't completely whitespace insensitive. A carriage return is semantically different to a space. It must be tokenized with multiple tokens. So, while it would seem that JSON was very easy for an LLM to understand, it's trying to understand it with the whitespace in. Wow. Mind you, that is how it would be able to make sense of python code. And minified javascript code would contain fewer tokens.

                      1 Reply Last reply
                      0
                      • wren6991@types.plW wren6991@types.pl

                        If you have heard the buzzword "agentic AI" but avoided finding out what it meant until now:

                        1. Someone figured out an LLM can do JSON RPCs by typing out the JSON token by token.

                        2. The LLM is run in a harness that regexes out the JSON from its output and executes the RPC.

                        3. The response is catted into the LLM's context window, also in the form of JSON that the LLM just reads.

                        4. People connect these harnesses to system shells on their dev machines.

                        5. Fast forward, this is a trillion-dollar industry held together by markdown files asking the LLM to please not curlbash from the internet.

                        toxomat@social.tchncs.deT This user is from outside of this forum
                        toxomat@social.tchncs.deT This user is from outside of this forum
                        toxomat@social.tchncs.de
                        wrote last edited by
                        #15

                        @wren6991
                        Thanks, that helps. "Harness" is the cool word for "wrapper script", right?

                        1 Reply Last reply
                        0
                        • System shared this topic
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups