I'm curious to know what people think about Anthropic's claim that Claude found 500 high-severity vulnerabilities in open-source packages.
-
@dangoodin that is EXACTLY what Anthropic said. LITERALLY it is the FIRST "vulnerability" they bogusly claim to have found.
> Neither of these methods yielded any significant findings. Eventually, however, Claude took a different approach: reading the Git commit history. Claude quickly found a security-relevant commit, and commented:
@dangoodin to which I said "hang the fuck on" and read a bit more. And hey look, it's in fonts... bounds checking...
https://security.snyk.io/vuln/SNYK-CENTOS10-GHOSTSCRIPTTOOLSFONTS-10299121
-
@dangoodin to which I said "hang the fuck on" and read a bit more. And hey look, it's in fonts... bounds checking...
https://security.snyk.io/vuln/SNYK-CENTOS10-GHOSTSCRIPTTOOLSFONTS-10299121
CVSS is 7.8, which is high, no? That would seem to support the Anthropic's claim. What's the significance of the vulns being in fonts . . . bounds checking?
-
I'm curious to know what people think about Anthropic's claim that Claude found 500 high-severity vulnerabilities in open-source packages. Has anyone confirmed that these vulns were indeed high-severity and hadn't been discovered before? Is this development as big a deal as Anthropic says? Any other critiques?
(on the flip side, curl ending their bug bounty program because of the flood of slop reports)
-
(on the flip side, curl ending their bug bounty program because of the flood of slop reports)
@cerement @dangoodin Exactly what I was going to point out.
-
CVSS is 7.8, which is high, no? That would seem to support the Anthropic's claim. What's the significance of the vulns being in fonts . . . bounds checking?
@dangoodin the significance is that by their own words, they didn't discover shit. Check the date on that CVE. But they're trying to claim dishonestly that their magical almost-to-AGI stochastic parrot totally discovered it.
It did not. Period. -
@dangoodin the significance is that by their own words, they didn't discover shit. Check the date on that CVE. But they're trying to claim dishonestly that their magical almost-to-AGI stochastic parrot totally discovered it.
It did not. Period.I'm not arguing with you. Sorry if it sounds like I am. I don't have the same technical background you do and am asking how the 7.8-severity vuln shouldn't be considered high severity because it involves fonts . . . bounds checking? I'm asking you to explain the reasoning behind your assessment as if I was a student in a security 101 class.
-
R relay@relay.an.exchange shared this topic
-
I'm not arguing with you. Sorry if it sounds like I am. I don't have the same technical background you do and am asking how the 7.8-severity vuln shouldn't be considered high severity because it involves fonts . . . bounds checking? I'm asking you to explain the reasoning behind your assessment as if I was a student in a security 101 class.
@dangoodin the tl;dr is basically that they are making the completely bogus claim that they 'discovered' a vulnerability, because they found the commit, which was specifically to fix the already disclosed vulnerability.
This is as insane as claiming to have shockingly discovered someone has a dog after they texted you pictures of them holding a puppy, asked you for name suggestions, set up IG and YT accounts for the puppy you subscribe to, and you hosted a puppy party at your house.
-
I'm not arguing with you. Sorry if it sounds like I am. I don't have the same technical background you do and am asking how the 7.8-severity vuln shouldn't be considered high severity because it involves fonts . . . bounds checking? I'm asking you to explain the reasoning behind your assessment as if I was a student in a security 101 class.
@dangoodin @rootwyrm It seems like their Ai is discovering flaws that have already been patched - the exact mechanisms may not have been disclosed previously and claude now knows there may be unpatched code out there, and how to exploit the, because it''s done some kind of analysis of the applied patch. If you don't patch your systems regularly, you are still vulnerable to older exploits.
-
Thanks for all the responses. So far, projects I understand to have received reports include: Ghostscript, OpenSC, lzw, and CGIF. Are others known? Links to commits that fix the vulns also appreciated.
@dangoodin the OpenSC commit that contains the highlighted code on the post https://github.com/OpenSC/OpenSC/commit/9ab1daf21029dd18f8828d684ee6151d9238edab . No detail about the fix and no security disclosure on the GitHub repository.
-
I'm curious to know what people think about Anthropic's claim that Claude found 500 high-severity vulnerabilities in open-source packages. Has anyone confirmed that these vulns were indeed high-severity and hadn't been discovered before? Is this development as big a deal as Anthropic says? Any other critiques?
@dangoodin hearsay, but I heard the model used had reduced safeguards, which allowed it to be more aggressive
-
I'm curious to know what people think about Anthropic's claim that Claude found 500 high-severity vulnerabilities in open-source packages. Has anyone confirmed that these vulns were indeed high-severity and hadn't been discovered before? Is this development as big a deal as Anthropic says? Any other critiques?
@dangoodin How popular/big were these OSS projects? There’s a big difference between finding a vuln in something like curl or Apache and my janky crap I pushed up to GitHub.
CVSS of 10/10 in my thing will impact one person, but in curl it’ll impact a few million more people. Including, still, me.
-
@dangoodin zero question it's pure fantasy bullshit. They refuse to show their work, as usual. All they've got is a middling CGIF vulnerability that isn't, and claiming credit for "finding" a vulnerability in GhostScript because "hey this commit did a thing so they must have had a vulnerability!"
@rootwyrm according to their blog it didn't claim that it found the vulnerability in the commit, but checked the rest of the code base if the same vulnerability might be unpatched in other places, and it seems to have been.
My questions are more with some others here: how many false positives had the human experts need to wade through to get to the real vulnerabilities -
I'm curious to know what people think about Anthropic's claim that Claude found 500 high-severity vulnerabilities in open-source packages. Has anyone confirmed that these vulns were indeed high-severity and hadn't been discovered before? Is this development as big a deal as Anthropic says? Any other critiques?
@dangoodin
Anthropic have a lot of resources for PR and issue a lot of dubious and misleading statements? -
I'm curious to know what people think about Anthropic's claim that Claude found 500 high-severity vulnerabilities in open-source packages. Has anyone confirmed that these vulns were indeed high-severity and hadn't been discovered before? Is this development as big a deal as Anthropic says? Any other critiques?
There's a long history of doing fuzzy matching on patterns of known bugs to find more of the same kind. Coccinelle is the most well-known example of this. It was not actually written for vulnerability discovery, but it turns out that you could write patterns to patch a vulnerability and then it would find a load of similar ones.
Few projects actually use it.
OpenBSD has a policy that people who find security bugs should search for similar things in the code and fix them all. It turns out that humans who write a bug in one places are very likely to write the same bug elsewhere and this is no less true for bugs that lead to security vulnerabilities.
It sounds like this is a pretty good use case for an LLM, because it is a tool for doing fuzzy matching on a token stream. Finding patches that fixed vulnerabilities and then looking for the 'before' shape in other places will find a load of things.
With a bit of automation (sorry, 'agentic' use), you can do the following flow:
- Find things that look like the 'before' state.
- Apply a patch to make it look like the 'after' state.
- Use guided fuzzing techniques to try to produce a test case that triggers the new checks introduced in the 'after' version.
- If you find an example, flag it to the user as a potential security issue.
It's probably very computationally expensive, but cheaper than having a human do the same thing (which is so expensive almost no one does it).
-
R relay@relay.infosec.exchange shared this topic