Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

Scheduled Pinned Locked Moved Uncategorized
llm
50 Posts 34 Posters 1 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • hopeless@mas.toH hopeless@mas.to

    @pseudonym It's certainly like that.

    FWIW though LLMs don't have any shame or feeling they need to manage their reputation.

    If you tell the same LLM that produced the report that it is now the QA manager and it must review the report from the standpoints of checking for missing or inaccurate citations, dubious claims or non-concise text, it will rat itself out and can be told to fix what it found.

    This is the same LLM entirely...

    nor4@chaos.socialN This user is from outside of this forum
    nor4@chaos.socialN This user is from outside of this forum
    nor4@chaos.social
    wrote last edited by
    #22

    @hopeless @pseudonym you are suggesting that you can just layer more shit onto the shit and after enough layers of shit it becomes not shit.

    iwein@mas.toI 1 Reply Last reply
    0
    • pseudonym@mastodon.onlineP pseudonym@mastodon.online

      If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

      That's a cognitively brutal task.

      Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

      I propose any productivity gains will be consumed by false negative review failures.

      dtwx@mastodon.socialD This user is from outside of this forum
      dtwx@mastodon.socialD This user is from outside of this forum
      dtwx@mastodon.social
      wrote last edited by
      #23

      @pseudonym also, when the senior retires, who replaces them?

      1 Reply Last reply
      0
      • pseudonym@mastodon.onlineP pseudonym@mastodon.online

        If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

        That's a cognitively brutal task.

        Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

        I propose any productivity gains will be consumed by false negative review failures.

        max@mas.lab4.appM This user is from outside of this forum
        max@mas.lab4.appM This user is from outside of this forum
        max@mas.lab4.app
        wrote last edited by
        #24

        @pseudonym This, %100. The Glass Cage by Nicholas Carr dives into this in depth with examples from aviation, and how full-automation of flight, makes it harder to recover from a disaster situation for pilots.

        pseudonym@mastodon.onlineP 1 Reply Last reply
        0
        • pseudonym@mastodon.onlineP pseudonym@mastodon.online

          If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

          That's a cognitively brutal task.

          Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

          I propose any productivity gains will be consumed by false negative review failures.

          deborahh@cosocial.caD This user is from outside of this forum
          deborahh@cosocial.caD This user is from outside of this forum
          deborahh@cosocial.ca
          wrote last edited by
          #25

          @pseudonym @mayintoronto … and: there will be no juniors to grow into seniors. 😨

          pseudonym@mastodon.onlineP 1 Reply Last reply
          0
          • pseudonym@mastodon.onlineP pseudonym@mastodon.online

            If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

            That's a cognitively brutal task.

            Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

            I propose any productivity gains will be consumed by false negative review failures.

            nuintari@mastodon.bsd.cafeN This user is from outside of this forum
            nuintari@mastodon.bsd.cafeN This user is from outside of this forum
            nuintari@mastodon.bsd.cafe
            wrote last edited by
            #26

            @pseudonym We are using AI inexactly the worst ways possible.

            Caveat: I am a never AI-er, due to the ethical issues surrounding how training data is gathered, the severe ecological and economic impacts, and the fact that deepfakes are objectively making the world a shittier place.

            But pretend for a second, none of those are a problem anymore. We are still using AI wrong. You don't have it produce a mountain of code and have a human review it. You still use humans to produce the code, and have AI help other humans to review it. AI isn't terribly good at writing code, but it has been shown to be effective at finding a few classes of bugs humans are typically very bad at finding.

            But that won't allow you to fire people and replace them with monkeys on typewriters, so it'll never happen.

            iwein@mas.toI 1 Reply Last reply
            0
            • R robinadams@mathstodon.xyz

              @pseudonym Especially since the sort of mistake that LLMs make is the sort of mistake that's hardest for humans to spot. They produce bad code that looks like good code, because they were trained on a lot of good code and told "Write code that looks like this".

              iwein@mas.toI This user is from outside of this forum
              iwein@mas.toI This user is from outside of this forum
              iwein@mas.to
              wrote last edited by
              #27

              @robinadams yes

              I'm not sure if this is a but or an and...

              The recent @squads blogpost by @EmmaDelescolle and @Tiziano notes that LLMs are good at reviews.

              In an LLM friendly context, seniors will delegate shit work to LLM of course. So now we have the horrid situation where young coders don't learn coding, and senior teaching skills atrophy. I'm sure retrospectives on this are delegated to an LLM as we speak somewhere 🤪

              Isn't this just the absolutely perfect shitstorm?

              @pseudonym

              1 Reply Last reply
              0
              • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                That's a cognitively brutal task.

                Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                I propose any productivity gains will be consumed by false negative review failures.

                jwcph@helvede.netJ This user is from outside of this forum
                jwcph@helvede.netJ This user is from outside of this forum
                jwcph@helvede.net
                wrote last edited by
                #28

                @pseudonym - and by costs of false positives.

                1 Reply Last reply
                0
                • nor4@chaos.socialN nor4@chaos.social

                  @hopeless @pseudonym you are suggesting that you can just layer more shit onto the shit and after enough layers of shit it becomes not shit.

                  iwein@mas.toI This user is from outside of this forum
                  iwein@mas.toI This user is from outside of this forum
                  iwein@mas.to
                  wrote last edited by
                  #29

                  @nor4 @hopeless @pseudonym if hidden well enough, it's ok to step in it, right 🤪

                  1 Reply Last reply
                  0
                  • toldtheworld@mastodon.socialT toldtheworld@mastodon.social

                    @pseudonym I have posed this conundrum before and the answer I received is that there is also an opportunity cost to not moving faster and the risk of a catastrophic bug may not outweigh the risk of being overtaken by competitors, especially since that was already happening before LLMs anyway.

                    Also, it *seems* models are improving at detecting these bugs, so they are being used to review changes, which, for the reasons you point out, they might be better at than people.

                    robotistry@mstdn.caR This user is from outside of this forum
                    robotistry@mstdn.caR This user is from outside of this forum
                    robotistry@mstdn.ca
                    wrote last edited by
                    #30

                    @toldtheworld @pseudonym I didn't think I'd see the day when I'd want to ask CEOs "If all your friends jumped off a cliff, would you do it too?"

                    Overtaken by competitors how? How is it "overtaken by" when what is actually happening is "my competitors are introducing fundamental flaws into their business model that will completely vitiate it as a workable product so all I have to do is wait for them to fail"?

                    Apparently the free market doesn't turn people into money-making machines that build products other people want, it turns CEOs into lemmings. Who knew?

                    1 Reply Last reply
                    2
                    0
                    • R relay@relay.mycrowd.ca shared this topic
                      R relay@relay.infosec.exchange shared this topic
                    • nuintari@mastodon.bsd.cafeN nuintari@mastodon.bsd.cafe

                      @pseudonym We are using AI inexactly the worst ways possible.

                      Caveat: I am a never AI-er, due to the ethical issues surrounding how training data is gathered, the severe ecological and economic impacts, and the fact that deepfakes are objectively making the world a shittier place.

                      But pretend for a second, none of those are a problem anymore. We are still using AI wrong. You don't have it produce a mountain of code and have a human review it. You still use humans to produce the code, and have AI help other humans to review it. AI isn't terribly good at writing code, but it has been shown to be effective at finding a few classes of bugs humans are typically very bad at finding.

                      But that won't allow you to fire people and replace them with monkeys on typewriters, so it'll never happen.

                      iwein@mas.toI This user is from outside of this forum
                      iwein@mas.toI This user is from outside of this forum
                      iwein@mas.to
                      wrote last edited by
                      #31

                      @nuintari what is AI?

                      Reason I ask is that for everything containing the least bit of software I can find a techbro willing to confabulate an 'ai' themed pitch deck. I'm not even kidding.

                      I surely hope to keep my dishwasher, if I promise not to call it 'ai' (but I'm sure someone else will) 😅

                      nuintari@mastodon.bsd.cafeN 1 Reply Last reply
                      0
                      • iwein@mas.toI iwein@mas.to

                        @nuintari what is AI?

                        Reason I ask is that for everything containing the least bit of software I can find a techbro willing to confabulate an 'ai' themed pitch deck. I'm not even kidding.

                        I surely hope to keep my dishwasher, if I promise not to call it 'ai' (but I'm sure someone else will) 😅

                        nuintari@mastodon.bsd.cafeN This user is from outside of this forum
                        nuintari@mastodon.bsd.cafeN This user is from outside of this forum
                        nuintari@mastodon.bsd.cafe
                        wrote last edited by
                        #32

                        @iwein Sorry, I've taken to just using the term AI when I mean LLM, even though I actually mean "Almost Incompetent," in my own head.

                        iwein@mas.toI 1 Reply Last reply
                        0
                        • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                          If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                          That's a cognitively brutal task.

                          Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                          I propose any productivity gains will be consumed by false negative review failures.

                          ferricoxide@blahaj.zoneF This user is from outside of this forum
                          ferricoxide@blahaj.zoneF This user is from outside of this forum
                          ferricoxide@blahaj.zone
                          wrote last edited by
                          #33

                          @pseudonym@mastodon.online

                          Yesterday, I was working on some PowerShell-based automation. I'm a UNIX/Linux guy. I'm used to Bash. I'm used to Python and pythonic DSLs. I'm… You get the drift. I'm
                          not a Windows guy and I'm not PowerShell guy.

                          A few days ago, I got an email from Google telling me that, because I have a storage plan (mostly for photos storage), that use of Gemini was now included. So, I opted to try to use Gemini to bridge my PowerShell knowledge-gaps. I came to a couple conclusions:

                          • If you're a
                          truly junior "coder" (haven't mastered at least one "language" and regularly applied that master to "the real world), relying on LLMs is likely to lead you to creating smoking holes
                          • Those "smoking holes" are the results of the LLM sometimes providing partially or wholly incorrect answers: I've had to correct Gemini several times
                          • Even where "smoking holes" aren't a risk, LLMs are not adequately speculative. To illustrate, I was trying to solve a problem. Gemini suggested a given path to take. The suggested-path
                          looked more generalizable, so I asked, "I feel like there's a good chance I can do similar within this other, very analogous component. I'm going to run a test to validate." Gemini's response was effectively, "don't bother: the documentation doesn't indicate that that will work." A couple decades' experience under my belt, I know that documentation is sometimes incomplete or wrong (out of date). So, I proceeded to test my suspicion and, lo and behold, it worked. If you're lacking "feel" for things, you'd likely take the LLM's "don't bother" guidance and go down a different path, a path that might be a lot more byzantine.

                          pseudonym@mastodon.onlineP 1 Reply Last reply
                          0
                          • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                            If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                            That's a cognitively brutal task.

                            Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                            I propose any productivity gains will be consumed by false negative review failures.

                            wendynather@infosec.exchangeW This user is from outside of this forum
                            wendynather@infosec.exchangeW This user is from outside of this forum
                            wendynather@infosec.exchange
                            wrote last edited by
                            #34

                            @pseudonym Yes. Very well put. I’m gonna use this …

                            pseudonym@mastodon.onlineP 1 Reply Last reply
                            0
                            • nuintari@mastodon.bsd.cafeN nuintari@mastodon.bsd.cafe

                              @iwein Sorry, I've taken to just using the term AI when I mean LLM, even though I actually mean "Almost Incompetent," in my own head.

                              iwein@mas.toI This user is from outside of this forum
                              iwein@mas.toI This user is from outside of this forum
                              iwein@mas.to
                              wrote last edited by
                              #35

                              @nuintari thanks for that 😁

                              1 Reply Last reply
                              0
                              • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                                If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                                That's a cognitively brutal task.

                                Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                                I propose any productivity gains will be consumed by false negative review failures.

                                ahimsa_pdx@disabled.socialA This user is from outside of this forum
                                ahimsa_pdx@disabled.socialA This user is from outside of this forum
                                ahimsa_pdx@disabled.social
                                wrote last edited by
                                #36

                                @pseudonym
                                Looks like Harvard Business Review agrees with you

                                Link Preview Image
                                AI Doesn’t Reduce Work—It Intensifies It

                                One of the promises of AI is that it can reduce workloads so employees can focus more on higher-value and more engaging tasks. But according to new research, AI tools don’t reduce work, they consistently intensify it: In the study, employees worked at a faster pace, took on a broader scope of tasks, and extended work into more hours of the day, often without being asked to do so. That may sound like a win, but it’s not quite so simple. These changes can be unsustainable, leading to workload creep, cognitive fatigue, burnout, and weakened decision-making. The productivity surge enjoyed at the beginning can give way to lower quality work, turnover, and other problems. To correct for this, companies need to adopt an “AI practice,” or a set of norms and standards around AI use that can include intentional pauses, sequencing work, and adding more human grounding.

                                favicon

                                Harvard Business Review (hbr.org)

                                I did not read the whole thing but summary says

                                "One of the promises of AI is that it can reduce workloads so employees can focus more on higher-value and more engaging tasks. But according to new research, AI tools don’t reduce work, they consistently intensify it ..."

                                1 Reply Last reply
                                0
                                • pseudonym@mastodon.onlineP pseudonym@mastodon.online

                                  If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

                                  That's a cognitively brutal task.

                                  Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

                                  I propose any productivity gains will be consumed by false negative review failures.

                                  toscalix@mastodon.socialT This user is from outside of this forum
                                  toscalix@mastodon.socialT This user is from outside of this forum
                                  toscalix@mastodon.social
                                  wrote last edited by
                                  #37

                                  @pseudonym

                                  1 Reply Last reply
                                  0
                                  • ahimsa_pdx@disabled.socialA This user is from outside of this forum
                                    ahimsa_pdx@disabled.socialA This user is from outside of this forum
                                    ahimsa_pdx@disabled.social
                                    wrote last edited by
                                    #38

                                    @JizzelEtBass
                                    Thanks ❤️

                                    1 Reply Last reply
                                    0
                                    • pseudonym@mastodon.onlineP This user is from outside of this forum
                                      pseudonym@mastodon.onlineP This user is from outside of this forum
                                      pseudonym@mastodon.online
                                      wrote last edited by
                                      #39

                                      @JizzelEtBass @ahimsa_pdx

                                      Yeah. Pretty sure I read that earlier and it influenced my thinking about this, leading to my post.

                                      Thanks for the reference.

                                      1 Reply Last reply
                                      0
                                      • wendynather@infosec.exchangeW wendynather@infosec.exchange

                                        @pseudonym Yes. Very well put. I’m gonna use this …

                                        pseudonym@mastodon.onlineP This user is from outside of this forum
                                        pseudonym@mastodon.onlineP This user is from outside of this forum
                                        pseudonym@mastodon.online
                                        wrote last edited by
                                        #40

                                        @wendynather

                                        Please do.

                                        Glad it had some value.

                                        Just my late night noodling about things.

                                        1 Reply Last reply
                                        0
                                        • ferricoxide@blahaj.zoneF ferricoxide@blahaj.zone

                                          @pseudonym@mastodon.online

                                          Yesterday, I was working on some PowerShell-based automation. I'm a UNIX/Linux guy. I'm used to Bash. I'm used to Python and pythonic DSLs. I'm… You get the drift. I'm
                                          not a Windows guy and I'm not PowerShell guy.

                                          A few days ago, I got an email from Google telling me that, because I have a storage plan (mostly for photos storage), that use of Gemini was now included. So, I opted to try to use Gemini to bridge my PowerShell knowledge-gaps. I came to a couple conclusions:

                                          • If you're a
                                          truly junior "coder" (haven't mastered at least one "language" and regularly applied that master to "the real world), relying on LLMs is likely to lead you to creating smoking holes
                                          • Those "smoking holes" are the results of the LLM sometimes providing partially or wholly incorrect answers: I've had to correct Gemini several times
                                          • Even where "smoking holes" aren't a risk, LLMs are not adequately speculative. To illustrate, I was trying to solve a problem. Gemini suggested a given path to take. The suggested-path
                                          looked more generalizable, so I asked, "I feel like there's a good chance I can do similar within this other, very analogous component. I'm going to run a test to validate." Gemini's response was effectively, "don't bother: the documentation doesn't indicate that that will work." A couple decades' experience under my belt, I know that documentation is sometimes incomplete or wrong (out of date). So, I proceeded to test my suspicion and, lo and behold, it worked. If you're lacking "feel" for things, you'd likely take the LLM's "don't bother" guidance and go down a different path, a path that might be a lot more byzantine.

                                          pseudonym@mastodon.onlineP This user is from outside of this forum
                                          pseudonym@mastodon.onlineP This user is from outside of this forum
                                          pseudonym@mastodon.online
                                          wrote last edited by
                                          #41

                                          @ferricoxide

                                          Same background (Unix grey beard) with current focus on security, and your experience matched my own.

                                          I was soaking in a lot more AI tools at last job, and experience and insight are key.

                                          Recently I had a system suggest multiple times to do it "the easy way" which emphatically was not how I wanted it to work. I was able to gently guide it back to what I wanted.

                                          Letting a senior dev do the work of a senior guiding a junior is about right. But still can't replace either.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups