"Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents.""Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation: even frontier models (Gemini, Claude, GPT) corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely."https://arxiv.org/abs/2604.15597#ai #llm #tech #science #gpt #claude #gemini