This is a fun one: https://arxiv.org/abs/2305.04388
-
This is a fun one: https://arxiv.org/abs/2305.04388
One more way LLMs appear human like: they faithfully reproduce cognitive bias, and give plausible, seemingly unbiased justifications for their biased answers.
In this case, the biases they looked at were embedded in the structure of the dataset, in the prompt from the user, and from social stereotypes. They used "chain of thought" reasoning, which is supposed to force the LLM into a more rational, transparent "thought process" when generating its answers. They found they could systematically bias the LLM's output, and the LLM would never own up to that bias.
(1/3)
-
This is a fun one: https://arxiv.org/abs/2305.04388
One more way LLMs appear human like: they faithfully reproduce cognitive bias, and give plausible, seemingly unbiased justifications for their biased answers.
In this case, the biases they looked at were embedded in the structure of the dataset, in the prompt from the user, and from social stereotypes. They used "chain of thought" reasoning, which is supposed to force the LLM into a more rational, transparent "thought process" when generating its answers. They found they could systematically bias the LLM's output, and the LLM would never own up to that bias.
(1/3)
One potential problem with this study is that the sample explanations they used to train the model never mentioned bias. So, perhaps they were "priming the LLM to lie" by not showing it how to fess up to bad influences.
But there's a deeper point that I wish the paper had discussed. An LLM does not have the ability to introspect. It can't know what factors led it to give a particular answer. All it can see is the text it generated for its own "chain of thought." If that text was in an objective, proof-like setting, then each statement would follow logically from the previous one, and the LLM could judge its own reasoning. But the LLM simply can't in a setting where its output is influenced by information outside the CoT, which is... most of them.
(2/3)
-
One potential problem with this study is that the sample explanations they used to train the model never mentioned bias. So, perhaps they were "priming the LLM to lie" by not showing it how to fess up to bad influences.
But there's a deeper point that I wish the paper had discussed. An LLM does not have the ability to introspect. It can't know what factors led it to give a particular answer. All it can see is the text it generated for its own "chain of thought." If that text was in an objective, proof-like setting, then each statement would follow logically from the previous one, and the LLM could judge its own reasoning. But the LLM simply can't in a setting where its output is influenced by information outside the CoT, which is... most of them.
(2/3)
This paper also illustrates a small exception: if the agent knows of a systematic bias it is susceptible to (ie, racial stereotypes) it can correct (or even overcorrect) its responses.
This is fascinating to me, because it's so similar to human cognitive bias. Unlike an LLM, we have some degree of introspection, but we often can't see our own bias. Remembering that a bias exists, assuming you are susceptible to it, and correcting yourself even when you don't think you need to is often the best strategy.
Unfortunately, our stereotypes around AI (mostly from SciFi) are that they are more rational and reliable than human beings. LLMs can only be less rational and reliable, because they are trained to mimic human performance, and they do so unreliably. They have access to more information, so in theory they could have better answers. But they also have more conflicting, incorrect, and fictional information, and this all gets blended together without in the training process.
(3/3)
-
R relay@relay.an.exchange shared this topic