Do any of the other major hallucination machines besides Claude have known/documented kill switch keywords?
-
Do any of the other major hallucination machines besides Claude have known/documented kill switch keywords?
-
Do any of the other major hallucination machines besides Claude have known/documented kill switch keywords?
@azonenberg For reasons
️ (they're being hard pushed on me at work to the point where I'm considering quiting), I've been idly wondering if you could add your own; take a corpus of text that you own, pick a very uncommon English word that's still likely to be tokenized in one or two parts, insert that word into the text at a random point and replace the rest with gibberishThrow the corpus onto a few reasonably likely to be scraped sites and then wait a few months
-
@azonenberg For reasons
️ (they're being hard pushed on me at work to the point where I'm considering quiting), I've been idly wondering if you could add your own; take a corpus of text that you own, pick a very uncommon English word that's still likely to be tokenized in one or two parts, insert that word into the text at a random point and replace the rest with gibberishThrow the corpus onto a few reasonably likely to be scraped sites and then wait a few months
@azonenberg for gibberish generation I'm considering a few options. Simplest is just old fashioned Markov gibberish
One thing I'm idly wondering though is if something with interesting spectral content would be more likely to be latched onto for a given volume of training data, IE take some pink noise and throw it into the tokenizer
-
@azonenberg for gibberish generation I'm considering a few options. Simplest is just old fashioned Markov gibberish
One thing I'm idly wondering though is if something with interesting spectral content would be more likely to be latched onto for a given volume of training data, IE take some pink noise and throw it into the tokenizer
@azonenberg but yeah as far as I can find out Claude is the only one with a known kill word unfortunately
-
@azonenberg but yeah as far as I can find out Claude is the only one with a known kill word unfortunately
@ldcd I strongly suspect the others have them for internal testing but aren't published. Would be cool if someone managed to reverse engineer them eventually
-
R relay@relay.infosec.exchange shared this topic