**Patches, variants, LLMs — a short timeline**
-
**Patches, variants, LLMs — a short timeline**
Aug 2019 · Halvar Flake: a patch is functionally equivalent to a vulnerability disclosure. Patch-diffing was already a working trade. What was about to change was the cost. *(Halvar Flake, "Rashomon of disclosure")*
Jun 2022 · Maddie Stone (P0): half of in-the-wild 0-days are variants of patched bugs. CVE-2022-22620 fixed in WebKit 2013, regressed Dec 2016, vulnerable 5 years. *(Stone, "0-day In-the-Wild Exploitation in 2022…so far" + "An Autopsy on a Zombie In-the-Wild 0-day")*
2024 · P0 × DeepMind: Naptime, then Big Sleep. "Variant analysis is a better fit for current LLMs." *(Glazunov & Brand, "Project Naptime"; Big Sleep team, "From Naptime to Big Sleep")*
May 2025 · Sean Heelan: o3 finds CVE-2025-37899 in ksmbd, adjacent to a bug he'd already found by hand. *(Heelan, "How I used o3 to find CVE-2025-37899")*
Feb 2026 · K-Repro: patch commit → working exploit, beats fuzzing baseline on cost. *("Patch-to-PoC: A Systematic Study of Agentic LLM Systems for Linux Kernel N-Day Reproduction," arXiv:2602.07287)*
Feb 2026 · Spaceraccoon: "negative-days" — LLM monitors commits, finds patched bugs before disclosure. *(Lim, "Discovering Negative-Days with LLM Workflows")*
Feb 2026 · Django Security Team: "Almost every report now is a variation on a prior vulnerability." One 2022 advisory → 4 releases in late 2025. *(Walls, "Recent trends in the work of the Django Security Team")*From Dullien's observation that patches leak, to LLMs industrialising the leak, to the maintainer's diary entry. 6½ years.
Taken from the *When buffers overflow into policy* project.
-
**Patches, variants, LLMs — a short timeline**
Aug 2019 · Halvar Flake: a patch is functionally equivalent to a vulnerability disclosure. Patch-diffing was already a working trade. What was about to change was the cost. *(Halvar Flake, "Rashomon of disclosure")*
Jun 2022 · Maddie Stone (P0): half of in-the-wild 0-days are variants of patched bugs. CVE-2022-22620 fixed in WebKit 2013, regressed Dec 2016, vulnerable 5 years. *(Stone, "0-day In-the-Wild Exploitation in 2022…so far" + "An Autopsy on a Zombie In-the-Wild 0-day")*
2024 · P0 × DeepMind: Naptime, then Big Sleep. "Variant analysis is a better fit for current LLMs." *(Glazunov & Brand, "Project Naptime"; Big Sleep team, "From Naptime to Big Sleep")*
May 2025 · Sean Heelan: o3 finds CVE-2025-37899 in ksmbd, adjacent to a bug he'd already found by hand. *(Heelan, "How I used o3 to find CVE-2025-37899")*
Feb 2026 · K-Repro: patch commit → working exploit, beats fuzzing baseline on cost. *("Patch-to-PoC: A Systematic Study of Agentic LLM Systems for Linux Kernel N-Day Reproduction," arXiv:2602.07287)*
Feb 2026 · Spaceraccoon: "negative-days" — LLM monitors commits, finds patched bugs before disclosure. *(Lim, "Discovering Negative-Days with LLM Workflows")*
Feb 2026 · Django Security Team: "Almost every report now is a variation on a prior vulnerability." One 2022 advisory → 4 releases in late 2025. *(Walls, "Recent trends in the work of the Django Security Team")*From Dullien's observation that patches leak, to LLMs industrialising the leak, to the maintainer's diary entry. 6½ years.
Taken from the *When buffers overflow into policy* project.
Geer's line — "the absence of unmitigatable surprise" is the canonical version, but "no silent failures" captures the same thing — was always operationally about **observability**. A failure you can see is a failure you can plan for. A silent failure is one that has already happened by the time you find out, and probably to people you can't help retroactively.
The standard he was articulating was a defender's epistemic floor: you must be able to know that something has gone wrong, and to know it before the consequences are irreversible. He used it to argue against complexity, against monoculture, against systems whose failure modes were either unobservable (you couldn't tell) or untreatable (you could tell but couldn't act). The whole framework presupposed that *the gap between failure and observation* was the variable defenders could and should compress.
LLMs change what counts as a silent failure in at least four ways, and they do it on both sides of the disclosure pipeline.
**1. Patches are now silent disclosures.** This is the Dullien-to-Spaceraccoon arc the timeline tracks. A patch shipping to a public repository was always, in principle, an information leak — but the cost of converting that leak into an exploit was high enough that the disclosure norm (90 days, coordinate, push) approximated the moment of public knowledge. With commodity LLM commit-monitoring, the patch *is* the public knowledge, and the gap between patch landing and exploit availability is now negligible for any project with a public commit history. The "silent" failure here isn't that the patch failed; it's that the disclosure window the defender thought existed has collapsed without the defender being told. Geer would call this an unmitigatable surprise: the failure mode (compressed time-to-exploit) is now structural, and the defender has no observable signal that it has changed for any specific patch.
**2. Variants are silent regressions.** Stone's H1 2022 finding — half of in-the-wild zero-days were variants of previously patched bugs — was already a silent-failure story. The patch shipped, the CVE closed, the dashboards turned green, and the underlying weakness was still there. CVE-2022-22620 is the cleanest example: WebKit fixed it in 2013, and for five years nobody — defender, vendor, or coordinated researcher — observed that the fix had been silently regressed by a 2016 refactoring. The Walls/Django data is the same story at industrial scale: a single 2022 advisory produces four security releases in late 2025 because every closed CVE is now systematically probed for variants by LLM-equipped reporters. Each of those variants was, before the report landed, a silent failure — a vulnerability whose existence was undetectable to the defender even though the precedent had been published.
**3. AI-generated code is silent vulnerability injection.** This is the inverse of the discovery story and arguably the more consequential one. When developers commit LLM-suggested code without understanding it, the failure mode isn't that the code is wrong (much of it is fine) but that the *reasoning trail* is absent. Iozzo's confabulation finding has the same structural shape on the discovery side: when an LLM "finds" a bug, its self-reported methodology may be a post-hoc story rather than a description of how it actually got there. Either way, the code or the report exists and looks reasonable, and the chain of justification that defenders previously relied on — *why* is this code shaped this way, *how* was this bug actually found — is gone. The output is observable; the reasoning is silent. Geer's standard fails on the second half: you can see the failure but you can't tell how it happened, which means you can't know whether the next one will look like it.
**4. The triage layer is now a silent-failure surface in its own right.** When 95% of inbound reports are LLM noise (Stenberg's curl numbers), the failure mode isn't only that genuine reports get lost — it's that maintainers who reach triage capacity start refusing or downgrading reports they would previously have processed. Stenberg shut the curl bounty. Nextcloud shut HackerOne. The vulnerability that *would have been reported* but wasn't, because the channel collapsed, is the purest silent failure Geer's framework can describe: a failure that can't be observed because the institutional sensor was unplugged in the noise.
Putting these together: Geer's standard implicitly assumed defenders had **time** (between patch and exploit), **provenance** (you could trace why a fix was correct), and **bandwidth** (the disclosure pipeline could absorb what was going through it). LLMs strip all three. The patch leak compresses time. The confabulated reasoning erodes provenance. The volume crisis exhausts bandwidth. None of these are individually new — patch-diffing existed in 2003, false positives existed before LLMs, maintainers were always overworked — but each of them used to be *bounded by human cost*, and that bound is now gone.
The policy implication, in Geer's idiom, is that the work of "making failures visible" can no longer rely on the institutional infrastructure that has historically done it. CVE/NVD/CNAs were calibrated for a world where the rate of incoming honest signal was bounded; they are not calibrated for a world where the rate of plausible signal is unbounded. The new defender capability — assuming Geer's standard still holds — has to include some way to **observe the silent failures of the disclosure system itself**: the patches that leaked before disclosure, the variants that didn't get reported, the reports that were filtered out by overwhelmed triage, the LLM-suggested code that committed without review. None of those have institutional sensors today.
That's the version of "no silent failures" that the timeline implicitly argues for. Dullien named the leak, Stone measured the regressions, P0 named the use case, Heelan demonstrated it, K-Repro formalised it, Spaceraccoon commoditised it, and Walls confirmed the maintainer-side cost. What the sequence shows is that the failure-observability gap that Geer treated as the defender's central problem has *widened* in every dimension at once — and that the institutions defenders relied on to close it were never designed for the failure modes the gap now contains.
-
R relay@relay.infosec.exchange shared this topic