"A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension.
-
"A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension. The researchers concluded that high performance often "results from verbatim reproduction of Stack Overflow answers rather than genuine reasoning." " https://www.researchgate.net/publication/403262523_Beyond_Code_Snippets_Benchmarking_LLMs_on_Repository-Level_Question_Answering
@codinghorror theft en masse as a business model
-
"A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension. The researchers concluded that high performance often "results from verbatim reproduction of Stack Overflow answers rather than genuine reasoning." " https://www.researchgate.net/publication/403262523_Beyond_Code_Snippets_Benchmarking_LLMs_on_Repository-Level_Question_Answering
@joe More of an FYI for this repost in case you’re curious. (It’s mentioned in the abstract.) https://macaw.social/@mergesort/116444049426350678
-
@joe More of an FYI for this repost in case you’re curious. (It’s mentioned in the abstract.) https://macaw.social/@mergesort/116444049426350678
@mergesort sounds like a good opportunity for a one-up paper to try it again with the newer models. would be interesting to see what difference the "reasoning" really makes
-
@codinghorror I gots no problem with da one-shotting da boilerplate! But the actual useful application is a far cry from what Jensen, who pretends to be everyone's friend, wants you to do the "tokenmaxxing" for.
@bms48 turns out far too many humans are pretty goddamned lazy and will ship the prototype. How do we change this?
-
@mergesort sounds like a good opportunity for a one-up paper to try it again with the newer models. would be interesting to see what difference the "reasoning" really makes
@joe Agreed! I’m genuinely always in favor of repeating research like this given how fast the models are moving. Even the non-reasoning models are dramatically better today so I’d love to run an experiment on them too, it’s just concerning to me when 1-2 year old outdated material becomes considered a source of truth.
-
"A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension. The researchers concluded that high performance often "results from verbatim reproduction of Stack Overflow answers rather than genuine reasoning." " https://www.researchgate.net/publication/403262523_Beyond_Code_Snippets_Benchmarking_LLMs_on_Repository-Level_Question_Answering
@codinghorror I have yet to have an LLM tell me to RTFM and then end the conversation.
-
@bms48 turns out far too many humans are pretty goddamned lazy and will ship the prototype. How do we change this?
@codinghorror @bms48 change incentives to be for long term not quarterly. Give people doing work more autonomy to set their own standards. Possibly UBI will enable this shift in perspective from eeking out a paycheck to professional/citizen/human responsibility/opportunity.
-
@codinghorror I have yet to have an LLM tell me to RTFM and then end the conversation.
@rjohnston I've never had that happen to me, personally, but I have pretty good resting bitch face to be fair.
-
@brianowen @codinghorror This is exactly what it is. This is exactly what the web dev industry has been for decades. Millions of LoC of garbage to justify prices for what should be an easy in-house job using an existing CMS with minimal or no code and should be as easy as using Excel.
@dalias @brianowen @codinghorror The number of billion dollar valuation security industry products that amount to a shiny web UI over a few FOSS tools ...
-
"A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension. The researchers concluded that high performance often "results from verbatim reproduction of Stack Overflow answers rather than genuine reasoning." " https://www.researchgate.net/publication/403262523_Beyond_Code_Snippets_Benchmarking_LLMs_on_Repository-Level_Question_Answering
@codinghorror 0 surprise there.
-
@slyecho feel free to evaluate yourself using whatever tools you prefer