"Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents."

asweetgentleman@mstdn.social

"Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation: even frontier models (Gemini, Claude, GPT) corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely."

https://arxiv.org/abs/2604.15597

#ai #llm #tech #science #gpt #claude #gemini

CIRCLE WITH A DOT

"Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents."