Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. "A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension.

"A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension.

Scheduled Pinned Locked Moved Uncategorized
25 Posts 17 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • codinghorror@infosec.exchangeC codinghorror@infosec.exchange

    "A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension. The researchers concluded that high performance often "results from verbatim reproduction of Stack Overflow answers rather than genuine reasoning." " https://www.researchgate.net/publication/403262523_Beyond_Code_Snippets_Benchmarking_LLMs_on_Repository-Level_Question_Answering

    dogiedog64@app.wafrn.netD This user is from outside of this forum
    dogiedog64@app.wafrn.netD This user is from outside of this forum
    dogiedog64@app.wafrn.net
    wrote last edited by
    #14

    @codinghorror@infosec.exchange

    Link Preview Image
    I'M Shocked! - Futurama GIF - Shocker Shocked Futurama - Discover & Share GIFs

    The perfect Shocker Shocked Futurama Animated GIF for your conversation. Discover and Share the best GIFs on Tenor.

    favicon

    Tenor (tenor.com)

    1 Reply Last reply
    0
    • codinghorror@infosec.exchangeC codinghorror@infosec.exchange

      "A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension. The researchers concluded that high performance often "results from verbatim reproduction of Stack Overflow answers rather than genuine reasoning." " https://www.researchgate.net/publication/403262523_Beyond_Code_Snippets_Benchmarking_LLMs_on_Repository-Level_Question_Answering

      overtondoors@infosec.exchangeO This user is from outside of this forum
      overtondoors@infosec.exchangeO This user is from outside of this forum
      overtondoors@infosec.exchange
      wrote last edited by
      #15

      @codinghorror theft en masse as a business model

      1 Reply Last reply
      0
      • codinghorror@infosec.exchangeC codinghorror@infosec.exchange

        "A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension. The researchers concluded that high performance often "results from verbatim reproduction of Stack Overflow answers rather than genuine reasoning." " https://www.researchgate.net/publication/403262523_Beyond_Code_Snippets_Benchmarking_LLMs_on_Repository-Level_Question_Answering

        mergesort@macaw.socialM This user is from outside of this forum
        mergesort@macaw.socialM This user is from outside of this forum
        mergesort@macaw.social
        wrote last edited by
        #16

        @joe More of an FYI for this repost in case you’re curious. (It’s mentioned in the abstract.) https://macaw.social/@mergesort/116444049426350678

        joe@f.duriansoftware.comJ 1 Reply Last reply
        0
        • mergesort@macaw.socialM mergesort@macaw.social

          @joe More of an FYI for this repost in case you’re curious. (It’s mentioned in the abstract.) https://macaw.social/@mergesort/116444049426350678

          joe@f.duriansoftware.comJ This user is from outside of this forum
          joe@f.duriansoftware.comJ This user is from outside of this forum
          joe@f.duriansoftware.com
          wrote last edited by
          #17

          @mergesort sounds like a good opportunity for a one-up paper to try it again with the newer models. would be interesting to see what difference the "reasoning" really makes

          mergesort@macaw.socialM 1 Reply Last reply
          0
          • bms48@mastodon.socialB bms48@mastodon.social

            @codinghorror I gots no problem with da one-shotting da boilerplate! But the actual useful application is a far cry from what Jensen, who pretends to be everyone's friend, wants you to do the "tokenmaxxing" for.

            codinghorror@infosec.exchangeC This user is from outside of this forum
            codinghorror@infosec.exchangeC This user is from outside of this forum
            codinghorror@infosec.exchange
            wrote last edited by
            #18

            @bms48 turns out far too many humans are pretty goddamned lazy and will ship the prototype. How do we change this?

            chris@social.lane-jayasinha.comC 1 Reply Last reply
            0
            • joe@f.duriansoftware.comJ joe@f.duriansoftware.com

              @mergesort sounds like a good opportunity for a one-up paper to try it again with the newer models. would be interesting to see what difference the "reasoning" really makes

              mergesort@macaw.socialM This user is from outside of this forum
              mergesort@macaw.socialM This user is from outside of this forum
              mergesort@macaw.social
              wrote last edited by
              #19

              @joe Agreed! I’m genuinely always in favor of repeating research like this given how fast the models are moving. Even the non-reasoning models are dramatically better today so I’d love to run an experiment on them too, it’s just concerning to me when 1-2 year old outdated material becomes considered a source of truth.

              1 Reply Last reply
              0
              • codinghorror@infosec.exchangeC codinghorror@infosec.exchange

                "A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension. The researchers concluded that high performance often "results from verbatim reproduction of Stack Overflow answers rather than genuine reasoning." " https://www.researchgate.net/publication/403262523_Beyond_Code_Snippets_Benchmarking_LLMs_on_Repository-Level_Question_Answering

                rjohnston@techhub.socialR This user is from outside of this forum
                rjohnston@techhub.socialR This user is from outside of this forum
                rjohnston@techhub.social
                wrote last edited by
                #20

                @codinghorror I have yet to have an LLM tell me to RTFM and then end the conversation.

                codinghorror@infosec.exchangeC 1 Reply Last reply
                0
                • codinghorror@infosec.exchangeC codinghorror@infosec.exchange

                  @bms48 turns out far too many humans are pretty goddamned lazy and will ship the prototype. How do we change this?

                  chris@social.lane-jayasinha.comC This user is from outside of this forum
                  chris@social.lane-jayasinha.comC This user is from outside of this forum
                  chris@social.lane-jayasinha.com
                  wrote last edited by
                  #21

                  @codinghorror @bms48 change incentives to be for long term not quarterly. Give people doing work more autonomy to set their own standards. Possibly UBI will enable this shift in perspective from eeking out a paycheck to professional/citizen/human responsibility/opportunity.

                  1 Reply Last reply
                  0
                  • rjohnston@techhub.socialR rjohnston@techhub.social

                    @codinghorror I have yet to have an LLM tell me to RTFM and then end the conversation.

                    codinghorror@infosec.exchangeC This user is from outside of this forum
                    codinghorror@infosec.exchangeC This user is from outside of this forum
                    codinghorror@infosec.exchange
                    wrote last edited by
                    #22

                    @rjohnston I've never had that happen to me, personally, but I have pretty good resting bitch face to be fair.

                    1 Reply Last reply
                    1
                    0
                    • dalias@hachyderm.ioD dalias@hachyderm.io

                      @brianowen @codinghorror This is exactly what it is. This is exactly what the web dev industry has been for decades. Millions of LoC of garbage to justify prices for what should be an easy in-house job using an existing CMS with minimal or no code and should be as easy as using Excel.

                      jesstheunstill@infosec.exchangeJ This user is from outside of this forum
                      jesstheunstill@infosec.exchangeJ This user is from outside of this forum
                      jesstheunstill@infosec.exchange
                      wrote last edited by
                      #23

                      @dalias @brianowen @codinghorror The number of billion dollar valuation security industry products that amount to a shiny web UI over a few FOSS tools ...

                      1 Reply Last reply
                      0
                      • codinghorror@infosec.exchangeC codinghorror@infosec.exchange

                        "A recent 2026 empirical study titled "Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering" (published on arXiv/ResearchGate) explicitly tested LLMs on codebase comprehension. The researchers concluded that high performance often "results from verbatim reproduction of Stack Overflow answers rather than genuine reasoning." " https://www.researchgate.net/publication/403262523_Beyond_Code_Snippets_Benchmarking_LLMs_on_Repository-Level_Question_Answering

                        doragasu@mastodon.sdf.orgD This user is from outside of this forum
                        doragasu@mastodon.sdf.orgD This user is from outside of this forum
                        doragasu@mastodon.sdf.org
                        wrote last edited by
                        #24

                        @codinghorror 0 surprise there.

                        1 Reply Last reply
                        0
                        • codinghorror@infosec.exchangeC This user is from outside of this forum
                          codinghorror@infosec.exchangeC This user is from outside of this forum
                          codinghorror@infosec.exchange
                          wrote last edited by
                          #25

                          @slyecho feel free to evaluate yourself using whatever tools you prefer

                          1 Reply Last reply
                          1
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups