Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

Scheduled Pinned Locked Moved Uncategorized
llm
50 Posts 34 Posters 1 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • ahimsa_pdx@disabled.socialA This user is from outside of this forum
    ahimsa_pdx@disabled.socialA This user is from outside of this forum
    ahimsa_pdx@disabled.social
    wrote last edited by
    #38

    @JizzelEtBass
    Thanks ❤️

    1 Reply Last reply
    0
    • pseudonym@mastodon.onlineP This user is from outside of this forum
      pseudonym@mastodon.onlineP This user is from outside of this forum
      pseudonym@mastodon.online
      wrote last edited by
      #39

      @JizzelEtBass @ahimsa_pdx

      Yeah. Pretty sure I read that earlier and it influenced my thinking about this, leading to my post.

      Thanks for the reference.

      1 Reply Last reply
      0
      • wendynather@infosec.exchangeW wendynather@infosec.exchange

        @pseudonym Yes. Very well put. I’m gonna use this …

        pseudonym@mastodon.onlineP This user is from outside of this forum
        pseudonym@mastodon.onlineP This user is from outside of this forum
        pseudonym@mastodon.online
        wrote last edited by
        #40

        @wendynather

        Please do.

        Glad it had some value.

        Just my late night noodling about things.

        1 Reply Last reply
        0
        • ferricoxide@blahaj.zoneF ferricoxide@blahaj.zone

          @pseudonym@mastodon.online

          Yesterday, I was working on some PowerShell-based automation. I'm a UNIX/Linux guy. I'm used to Bash. I'm used to Python and pythonic DSLs. I'm… You get the drift. I'm
          not a Windows guy and I'm not PowerShell guy.

          A few days ago, I got an email from Google telling me that, because I have a storage plan (mostly for photos storage), that use of Gemini was now included. So, I opted to try to use Gemini to bridge my PowerShell knowledge-gaps. I came to a couple conclusions:

          • If you're a
          truly junior "coder" (haven't mastered at least one "language" and regularly applied that master to "the real world), relying on LLMs is likely to lead you to creating smoking holes
          • Those "smoking holes" are the results of the LLM sometimes providing partially or wholly incorrect answers: I've had to correct Gemini several times
          • Even where "smoking holes" aren't a risk, LLMs are not adequately speculative. To illustrate, I was trying to solve a problem. Gemini suggested a given path to take. The suggested-path
          looked more generalizable, so I asked, "I feel like there's a good chance I can do similar within this other, very analogous component. I'm going to run a test to validate." Gemini's response was effectively, "don't bother: the documentation doesn't indicate that that will work." A couple decades' experience under my belt, I know that documentation is sometimes incomplete or wrong (out of date). So, I proceeded to test my suspicion and, lo and behold, it worked. If you're lacking "feel" for things, you'd likely take the LLM's "don't bother" guidance and go down a different path, a path that might be a lot more byzantine.

          pseudonym@mastodon.onlineP This user is from outside of this forum
          pseudonym@mastodon.onlineP This user is from outside of this forum
          pseudonym@mastodon.online
          wrote last edited by
          #41

          @ferricoxide

          Same background (Unix grey beard) with current focus on security, and your experience matched my own.

          I was soaking in a lot more AI tools at last job, and experience and insight are key.

          Recently I had a system suggest multiple times to do it "the easy way" which emphatically was not how I wanted it to work. I was able to gently guide it back to what I wanted.

          Letting a senior dev do the work of a senior guiding a junior is about right. But still can't replace either.

          1 Reply Last reply
          0
          • toldtheworld@mastodon.socialT toldtheworld@mastodon.social

            @pseudonym I have posed this conundrum before and the answer I received is that there is also an opportunity cost to not moving faster and the risk of a catastrophic bug may not outweigh the risk of being overtaken by competitors, especially since that was already happening before LLMs anyway.

            Also, it *seems* models are improving at detecting these bugs, so they are being used to review changes, which, for the reasons you point out, they might be better at than people.

            pseudonym@mastodon.onlineP This user is from outside of this forum
            pseudonym@mastodon.onlineP This user is from outside of this forum
            pseudonym@mastodon.online
            wrote last edited by
            #42

            @toldtheworld

            The models may indeed get better at finding and fixing their own mistakes, and would not be subject to human fatigue, that's true. But it is never perfect, so you still need a human in the loop. You've just pushed back the time a bit before you missed a harder-to-detect error. Which is inevitable, because hallucinations / confabulations are a feature, not a bug, of essential LLM operations.

            So you make more, faster, harder to spot errors. Better LLM checkers increase the risk.

            1 Reply Last reply
            0
            • deborahh@cosocial.caD deborahh@cosocial.ca

              @pseudonym @mayintoronto … and: there will be no juniors to grow into seniors. 😨

              pseudonym@mastodon.onlineP This user is from outside of this forum
              pseudonym@mastodon.onlineP This user is from outside of this forum
              pseudonym@mastodon.online
              wrote last edited by
              #43

              @deborahh @mayintoronto

              Yup. This is my biggest structural concern, really. But I only had 500 characters to consider the previous post, and wanted to focus on the review cost of any "gains" one might have.

              There are more related topics to discuss, but the breaking of the funnel to train the next generation of skilled people is huge.

              1 Reply Last reply
              0
              • max@mas.lab4.appM max@mas.lab4.app

                @pseudonym This, %100. The Glass Cage by Nicholas Carr dives into this in depth with examples from aviation, and how full-automation of flight, makes it harder to recover from a disaster situation for pilots.

                pseudonym@mastodon.onlineP This user is from outside of this forum
                pseudonym@mastodon.onlineP This user is from outside of this forum
                pseudonym@mastodon.online
                wrote last edited by
                #44

                @max

                Thanks for the reference. Didn't know that one.

                1 Reply Last reply
                0
                • wronglang@bayes.clubW wronglang@bayes.club

                  @xrisk @malstrom @pseudonym just for clarity, LLMs don't learn concepts

                  pseudonym@mastodon.onlineP This user is from outside of this forum
                  pseudonym@mastodon.onlineP This user is from outside of this forum
                  pseudonym@mastodon.online
                  wrote last edited by
                  #45

                  @wronglang @xrisk @malstrom

                  Correct. They don't learn concepts. That's the key confusion in so much of the discussion and use around them.

                  They have no world model, and don't reason at all. But they perform a very good facsimile of reasoning, because reasoning is embedded in and has shaped the patterns of speech, text, and code.

                  They pattern match. That's all. Full stop. But they do it so well it looks like speech, or code, or understanding.

                  1 Reply Last reply
                  0
                  • moutmout@framapiaf.orgM moutmout@framapiaf.org

                    @pseudonym This.

                    I do a lot of "computer science labs", where students learn to write code, and they wave me down when they have questions. When their code doesn't do what they expect, it's often easy to figure out what went wrong because you can spot a bit of code that looks funky. And usually, the problem is in those few lines.

                    LLM code is meant to look like good code, so you don't get these little shortcuts.

                    pseudonym@mastodon.onlineP This user is from outside of this forum
                    pseudonym@mastodon.onlineP This user is from outside of this forum
                    pseudonym@mastodon.online
                    wrote last edited by
                    #46

                    @Moutmout

                    Good example I hadn't thought of.

                    Yes, human novice code mistakes have a "shape" to them a teacher can recognize quickly, or suspect because of how the error manifests.

                    These are different classes of "good looking" failures.

                    1 Reply Last reply
                    0
                    • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                      If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                      That's a cognitively brutal task.

                      Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                      I propose any productivity gains will be consumed by false negative review failures.

                      leftpaddotpy@hachyderm.ioL This user is from outside of this forum
                      leftpaddotpy@hachyderm.ioL This user is from outside of this forum
                      leftpaddotpy@hachyderm.io
                      wrote last edited by
                      #47

                      @pseudonym i think it depends on the domain. like, code review is not seriously expected to catch all bugs; it's merely a step in a process. if you need absolute correctness (most don't!) then formal methods, a shockingly rare practice in the most critical industries, might be the right choice.

                      a stronger argument would be "the bugs are less obvious" though i think that too can be fought with observability. but that strategy only works well in application code, i.e. code which "makes money" (a notion which should be challenged, but that's another issue), rather than infra layer stuff with higher correctness needs and worse observability. and you know how the old saying goes: "if the code is good it's probably not making money". idk, people write slop where they already wrote slop due to the same pressures as before.

                      1 Reply Last reply
                      0
                      • ainmosni@social.ainmosni.euA ainmosni@social.ainmosni.eu

                        @pseudonym This was my experience from the start, and is what made me gave up on LLM assisted coding. Of course, that was before I was aware of the abhorrent externalities that came with using the slop machine...

                        pseudonym@mastodon.onlineP This user is from outside of this forum
                        pseudonym@mastodon.onlineP This user is from outside of this forum
                        pseudonym@mastodon.online
                        wrote last edited by
                        #48

                        @ainmosni

                        Yup.

                        My thoughts aren't new.

                        Just felt the need to to pack them up into something bite-sized.

                        To explain where I see one of the fundamental design failures, as a function of even any potential "good stuff" that may arise.

                        1 Reply Last reply
                        0
                        • a_goodall_spaceship@norden.socialA This user is from outside of this forum
                          a_goodall_spaceship@norden.socialA This user is from outside of this forum
                          a_goodall_spaceship@norden.social
                          wrote last edited by
                          #49

                          @adrianmorales @pseudonym Stop that, I love dark star!

                          1 Reply Last reply
                          0
                          • avuko@infosec.exchangeA avuko@infosec.exchange

                            @pseudonym and because the high volume consists of what I’ve dubbed “plausible bullshit”, reviewers will have to battle a plethora of their biases as well.

                            There are fields (I’ve heard stories about protein and material design, and vulnerability discovery) where filtering the BS for real discoveries can be worth it. I’m guessing it works because there is a reality to test against.

                            But for the love of humanity, don’t use it for anything descriptive or abstract.

                            michael@westergaard.socialM This user is from outside of this forum
                            michael@westergaard.socialM This user is from outside of this forum
                            michael@westergaard.social
                            wrote last edited by
                            #50
                            I like to say that LLMS are a great way to reduce junior development time at the cost of senior review time.
                            1 Reply Last reply
                            1
                            0
                            Reply
                            • Reply as topic
                            Log in to reply
                            • Oldest to Newest
                            • Newest to Oldest
                            • Most Votes


                            • Login

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • World
                            • Users
                            • Groups