"A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension.

mergesort@macaw.social

@joe More of an FYI for this repost in case you’re curious. (It’s mentioned in the abstract.) https://macaw.social/@mergesort/116444049426350678

joe@f.duriansoftware.com

@mergesort sounds like a good opportunity for a one-up paper to try it again with the newer models. would be interesting to see what difference the "reasoning" really makes

codinghorror@infosec.exchange

@bms48 turns out far too many humans are pretty goddamned lazy and will ship the prototype. How do we change this?

mergesort@macaw.social

@joe Agreed! I’m genuinely always in favor of repeating research like this given how fast the models are moving. Even the non-reasoning models are dramatically better today so I’d love to run an experiment on them too, it’s just concerning to me when 1-2 year old outdated material becomes considered a source of truth.

rjohnston@techhub.social

@codinghorror I have yet to have an LLM tell me to RTFM and then end the conversation.

chris@social.lane-jayasinha.com

@codinghorror @bms48 change incentives to be for long term not quarterly. Give people doing work more autonomy to set their own standards. Possibly UBI will enable this shift in perspective from eeking out a paycheck to professional/citizen/human responsibility/opportunity.

codinghorror@infosec.exchange

@rjohnston I've never had that happen to me, personally, but I have pretty good resting bitch face to be fair.

jesstheunstill@infosec.exchange

@dalias @brianowen @codinghorror The number of billion dollar valuation security industry products that amount to a shiny web UI over a few FOSS tools ...

doragasu@mastodon.sdf.org

@codinghorror 0 surprise there.

codinghorror@infosec.exchange

@slyecho feel free to evaluate yourself using whatever tools you prefer

CIRCLE WITH A DOT

"A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension.