#Mythos finds a #curl vulnerability

david_chisnall@infosec.exchange

The original Coverity paper claimed, as I recall, 300 CVEs. I'm not sure what the severity distribution was, but that seems a lot more than Mythos, and they probably used less compute than a single Mythos query.

The problem with any static analyser, whether it's based on formal reasoning or pattern recognition, is that it will be unsound (i.e. it will have false positives, in contrast with dynamic analyses that are incomplete and have false negatives). The LLM-based tools are no different in this respect. From a Claude 'comprehensive code review' of one of my projects, the only serious bug in the top ten that it found was one that already had an open PR to fix, and two were not only not bugs, they were intentional design choices and doing it the other way would have caused serious performance regressions (and not fixed bugs).

The thing that does make Mythos different is that it tries to build a PoC exploit. This will reduce the false positive rate, at the expense of creating false negatives (if it can't produce a PoC, you ignore it).

When I've used Coverity on a large project, it's found tens of thousands of bugs, and most of them are false positives, so it requires a lot of effort to find the ones that are actually important bugs. Something that produces PoCs automatically would help this a lot.

The baseline data point I'd really like to see is something that integrates the clang analyser with libFuzzer. For each report the analyser finds, insert profiling points at the branches on the control flow chain that it recommends, then automatically drive the fuzzer to try to trigger the code paths that the analyser reported as potential issues.

The default settings for the clang analyser are compilation-unit-at-a-time and with reduced bounds on loop iteration counts to avoid using enormous amounts of memory. If you're willing to spend as much money as it costs to operate the LLM-based tools, you can use the cross-compilation-unit approaches and bump the state up a lot. Running it configured to use a comparable amount of RAM to the GPUs that the Anthropic models run on would let you do a lot of symbolic execution.

doragasu@mastodon.sdf.org

@bagder In line with what this blog post stated shortly after it was announced: the model is nothing special and much cheaper models can find the same bugs. Marketing BS turned to 11. https://www.flyingpenguin.com/the-boy-that-cried-mythos-verification-is-collapsing-trust-in-anthropic/

gnirre@mastodon.social

@bagder Did Anthropic know that you finally had gotten access to Mythos?

bagder@mastodon.social

@gnirre no idea, probably not

spitfire@mastodon.de

@bagder one? wow, that really was worth burning the planet's resources.

kleisli@mastodon.social

@quinn my current opinion: for security scans and reviews, AI tools are and will be useful, but not to generate code. @bagder

gnirre@mastodon.social

@bagder Maybe my question should have been if Alpha Omega knew? Your access was "inofficial"?

bagder@mastodon.social

@gnirre I don't know how much they asked or told A about when this was done. It's not "my" access, someone else has the access and ran the analysis

quinn@social.circl.lu

@kleisli @bagder
if it's something like 10,000 euros a pop, it might not be worth security scans and reviews, except for governmental clients.

frankgevaerts@mastodon.social

@synlogic4242 @bagder Yes, someone really needs to get on to that rewriting thing. Just a pity there hasn't been a weekend in *years* so nobody had the chance!

0x0@hachyderm.io

@quinn

Especially if it's subscription-based, as these models seem to be good at finding only specific sets of problems and then dry out, but even 10k per use is really gov or big corpo territory.

@kleisli @bagder

redsakana@infosec.exchange

@bagder This suggests a fun exercise for someone interested in messing around with LLMs:

1. Put back all the curl security issues previously found by LLM tools by dropping the fix commits from history or otherwise obfuscating the revert.

2. Feed the re-vulnerabilized repo to a selection of models and see what are the cheapest ones (by memory, time and/or monetary cost) that can find, say, 50%/75%/100% of the issues found by the warehouse-scale "foundation models".

Feels like a large part of the current results should be doable with significantly smaller resources, because being trained on every tweet and reddit post and libgen book ever is not obviously related to the task.

eobet@oldbytes.space

@bagder great, so even the Linux Foundation are naming things after the ultimate evil of a famous franchise? (Final Fantasy in this instance.)

phl@mastodon.social

@bagder “On average, every single production source code line of curl has been written (and then rewritten) 4.14 times.”

curl is the ship of Theseus not once, not twice, but four times

4censord@unfug.social

@gnirre @bagder with the most glancing of looks, looking at the 150 version of firefox (and some rounding),
curl: 200k lines of c
firefox:

5M lines of rust
9M lines of C and C++
200k lines of assembly
2M lines of python

so like, without looking at anything else, firefox is significantly bigger

paco@infosec.exchange

@km As far as I can tell:

No one who has worked with raw Mythos output has ever written about it.
No one who has written about it has ever used it.

They would much rather have @bagder writing about it because his opinion carries weight. That means he can’t have direct access. To give him access, they’d demand to gag him with an NDA, like everyone else who has access.

This technique of making readers mentally fill in the gaps between what is verifiable and what is claimed is genius marketing and really dishonest. But we have come to expect systematic and casual dishonesty from these companies.

km@mastodon.babb.no

@paco @bagder yeah, let me clarify: i talked with people who not themselves used mythos, but whose org was given access, so yeah, they just told something which they were told

natanox@chaos.social

@4censord @gnirre @bagder Also, didn't they intentionally disable all mitigations, sandboxing etc. in Firefox *and* include every teeny tiny bug it found (without mentioning the false-positives, which were probably a metric shit ton) to bolster those numbers?

There were lots of shenanigans afaik.

paco@infosec.exchange

@km Yeah. I didn’t mean it personally. I wasn’t criticising what you said, I’m sorry if I sounded that way.

I was just pointing out this constant theme. The only thing that ever is made public is the fully-polished, human-vetted final result. They carefully hide all other details and the press don’t care.

@bagder

quinn@social.circl.lu

@0x0 @kleisli @bagder to be clear i picked that number out of my butt, but it is clear to me that it's going to be very hard to make up their investment in it, much less than the min 10x (which would probably be a couple trillion dollars)

CIRCLE WITH A DOT

#Mythos finds a #curl vulnerability

Mythos finds a curl vulnerability

Mythos finds a curl vulnerability

Mythos finds a curl vulnerability

Mythos finds a curl vulnerability