When you use an AI to check on an AI, assuming they operate indendent, their combined success rate is multiplicative.

icing@chaos.social

E.g. 80% success rate, applied twice gives 64% success rate.

(They might not be independent, training for Sycophancy might make it worse.)

marshray@infosec.exchange

@icing If they are truly configured to operate as redundant “checks”, wouldn’t it be the *failure* rate (1 - P_success) that multiplies?

icing@chaos.social

@marshray When each check can ruin the outcome, the success rates multiply. When any positive check is overall success, the failure rates multiply, or?

icing@chaos.social

@marshray back to AI.

When AI 1 write a vulnerability report and you use AI 2 to check those reports, the overall assessment is only good when both do the right thing.

marshray@infosec.exchange

@icing I see the critical assumptions as:

1. AI 1 is operated to not create a bad report in the first place
2. AI 2 is operated to reject a bad report
3. The AIs probability of failure at (1) and (2) are uncorrelated

If these assumptions were somehow validated, then they would constitute a “belt and suspenders” type redundant system.

But such assumptions are rarely justified in practice.

CIRCLE WITH A DOT

When you use an AI to check on an AI, assuming they operate indendent, their combined success rate is multiplicative.