Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Do any of the other major hallucination machines besides Claude have known/documented kill switch keywords?

Do any of the other major hallucination machines besides Claude have known/documented kill switch keywords?

Scheduled Pinned Locked Moved Uncategorized
5 Posts 2 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • azonenberg@ioc.exchangeA This user is from outside of this forum
    azonenberg@ioc.exchangeA This user is from outside of this forum
    azonenberg@ioc.exchange
    wrote last edited by
    #1

    Do any of the other major hallucination machines besides Claude have known/documented kill switch keywords?

    ldcd@social.treehouse.systemsL 1 Reply Last reply
    0
    • azonenberg@ioc.exchangeA azonenberg@ioc.exchange

      Do any of the other major hallucination machines besides Claude have known/documented kill switch keywords?

      ldcd@social.treehouse.systemsL This user is from outside of this forum
      ldcd@social.treehouse.systemsL This user is from outside of this forum
      ldcd@social.treehouse.systems
      wrote last edited by
      #2

      @azonenberg For reasons ™️ (they're being hard pushed on me at work to the point where I'm considering quiting), I've been idly wondering if you could add your own; take a corpus of text that you own, pick a very uncommon English word that's still likely to be tokenized in one or two parts, insert that word into the text at a random point and replace the rest with gibberish

      Throw the corpus onto a few reasonably likely to be scraped sites and then wait a few months

      ldcd@social.treehouse.systemsL 1 Reply Last reply
      0
      • ldcd@social.treehouse.systemsL ldcd@social.treehouse.systems

        @azonenberg For reasons ™️ (they're being hard pushed on me at work to the point where I'm considering quiting), I've been idly wondering if you could add your own; take a corpus of text that you own, pick a very uncommon English word that's still likely to be tokenized in one or two parts, insert that word into the text at a random point and replace the rest with gibberish

        Throw the corpus onto a few reasonably likely to be scraped sites and then wait a few months

        ldcd@social.treehouse.systemsL This user is from outside of this forum
        ldcd@social.treehouse.systemsL This user is from outside of this forum
        ldcd@social.treehouse.systems
        wrote last edited by
        #3

        @azonenberg for gibberish generation I'm considering a few options. Simplest is just old fashioned Markov gibberish

        One thing I'm idly wondering though is if something with interesting spectral content would be more likely to be latched onto for a given volume of training data, IE take some pink noise and throw it into the tokenizer

        ldcd@social.treehouse.systemsL 1 Reply Last reply
        0
        • ldcd@social.treehouse.systemsL ldcd@social.treehouse.systems

          @azonenberg for gibberish generation I'm considering a few options. Simplest is just old fashioned Markov gibberish

          One thing I'm idly wondering though is if something with interesting spectral content would be more likely to be latched onto for a given volume of training data, IE take some pink noise and throw it into the tokenizer

          ldcd@social.treehouse.systemsL This user is from outside of this forum
          ldcd@social.treehouse.systemsL This user is from outside of this forum
          ldcd@social.treehouse.systems
          wrote last edited by
          #4

          @azonenberg but yeah as far as I can find out Claude is the only one with a known kill word unfortunately

          azonenberg@ioc.exchangeA 1 Reply Last reply
          0
          • ldcd@social.treehouse.systemsL ldcd@social.treehouse.systems

            @azonenberg but yeah as far as I can find out Claude is the only one with a known kill word unfortunately

            azonenberg@ioc.exchangeA This user is from outside of this forum
            azonenberg@ioc.exchangeA This user is from outside of this forum
            azonenberg@ioc.exchange
            wrote last edited by
            #5

            @ldcd I strongly suspect the others have them for internal testing but aren't published. Would be cool if someone managed to reverse engineer them eventually

            1 Reply Last reply
            1
            0
            • R relay@relay.infosec.exchange shared this topic
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups