Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools.

I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools.

Scheduled Pinned Locked Moved Uncategorized
59 Posts 28 Posters 49 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • csepp@merveilles.townC csepp@merveilles.town

    @gabrielesvelto Or when sed fails you can often write a quick script in Python (or your language of choice).
    For real tho I would love to have a dependable refactoring tool that understands syntax, probably something based on Tree Sitter, but I haven't been able to get any working.

    keithpjolley@discuss.systemsK This user is from outside of this forum
    keithpjolley@discuss.systemsK This user is from outside of this forum
    keithpjolley@discuss.systems
    wrote last edited by
    #21

    @csepp @gabrielesvelto tbf, in all likelyhood it wouldn't be `sed` that fails. it would be the inputs to `sed` that failed - garbage in, garbage out.

    1 Reply Last reply
    0
    • gabrielesvelto@mas.toG gabrielesvelto@mas.to

      I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?

      P This user is from outside of this forum
      P This user is from outside of this forum
      patricus@gts.posix.live
      wrote last edited by
      #22

      @gabrielesvelto not really, it is not on my computer.

      1 Reply Last reply
      0
      • fourlastor@androiddev.socialF fourlastor@androiddev.social

        @gabrielesvelto I don't think the comparison is entirely fair tho. Both sed and syntax tree based editing are really powerful (and I use both when it makes sense), but if you need to do a one off migration you might be spending hours trying to figure out how to make it work right, while an llm will do a good enough job where you need to review the changes and fix a few mistakes, usually at the first try, without you having to actively spend time on it.

        crazyeddie@mastodon.socialC This user is from outside of this forum
        crazyeddie@mastodon.socialC This user is from outside of this forum
        crazyeddie@mastodon.social
        wrote last edited by
        #23

        @fourlastor @gabrielesvelto It's not a use sed or use LLM scenario here.

        Sed isn't a refactoring tool. There are plenty of actual refactoring tools that don't use LLMs. I was using them before LLMs were invented and no, fucking sed isn't the same thing. I'm rather hoping that wasn't actually a serious comparison 😛

        Mechanical refactors are deterministic algorithms. If the conversation is about sticking AI in that it's probably nonsense and you can leave without fearing you'll miss anything

        1 Reply Last reply
        0
        • em0nm4stodon@infosec.exchangeE em0nm4stodon@infosec.exchange shared this topic
        • gabrielesvelto@mas.toG gabrielesvelto@mas.to

          I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?

          christopherkunz@chaos.socialC This user is from outside of this forum
          christopherkunz@chaos.socialC This user is from outside of this forum
          christopherkunz@chaos.social
          wrote last edited by
          #24

          @gabrielesvelto It's also Turing complete.

          1 Reply Last reply
          0
          • gabrielesvelto@mas.toG gabrielesvelto@mas.to

            @csepp several fancy IDEs already have extremely sophisticate refactoring tools that understand the language syntax, e.g.: https://www.jetbrains.com/help/idea/refactoring-source-code.html

            crazyeddie@mastodon.socialC This user is from outside of this forum
            crazyeddie@mastodon.socialC This user is from outside of this forum
            crazyeddie@mastodon.social
            wrote last edited by
            #25

            @gabrielesvelto @csepp I bet if you look at the C++ part of the tools there's not many refactors they can do 😛

            1 Reply Last reply
            0
            • csepp@merveilles.townC csepp@merveilles.town

              @gabrielesvelto Yup, those are also pretty great.
              Personally, I needed to refactor some C++ code that didn't fit any simple regex, so I ended up writing a Lua script to do it and did the rest of it by hand.
              The only way I could find to reliably automate it would have been to write a custom clang-tidy pass, which didn't seem worth the effort.
              I still wouldn't use an LLM for it, but I do wish there was an easier way to load the code model in a scripting language. To automate the refactor I did I would have needed to track arguments that are passed through variables or that come from function parameters, access non-C++ files (move strings to YAML), rewrite various forms of string concatenation to format strings, etc.

              crazyeddie@mastodon.socialC This user is from outside of this forum
              crazyeddie@mastodon.socialC This user is from outside of this forum
              crazyeddie@mastodon.social
              wrote last edited by
              #26

              @csepp @gabrielesvelto Doesn't look like lua really has a good binding to libclang but if you used Python you could use the same libraries that clang-format/tidy do. They're using the actual llvm parser and give you an API to manipulate the AST.

              csepp@merveilles.townC 1 Reply Last reply
              0
              • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?

                pepperthevixen@meow.socialP This user is from outside of this forum
                pepperthevixen@meow.socialP This user is from outside of this forum
                pepperthevixen@meow.social
                wrote last edited by
                #27

                @gabrielesvelto "Yeah but Sed is old and shitty and you gotta get with the times" -some techbro somewhere

                1 Reply Last reply
                0
                • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                  I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?

                  pepperthevixen@meow.socialP This user is from outside of this forum
                  pepperthevixen@meow.socialP This user is from outside of this forum
                  pepperthevixen@meow.social
                  wrote last edited by
                  #28

                  @gabrielesvelto NGL when I read "mechanical refactoring", I first imagined a bunch of robot arms on an Aperture-esque assembly line rearranging letters on printing press-style blocks

                  1 Reply Last reply
                  0
                  • adingbatponder@fosstodon.orgA adingbatponder@fosstodon.org

                    @gabrielesvelto For fun I tried writing rust code with claude code. The code took an age to compile when it worked (do we call it build?). The project took months and so the code got large & was slow to build. Claude was able to refactor it (after it worked) to build 10 times faster. That is not mechanical as you mention... but was really challenging. Mechanical refactors it does 100 times better still of course, because it seds too yes, but it can check the new syntax & test build each change.

                    gabrielesvelto@mas.toG This user is from outside of this forum
                    gabrielesvelto@mas.toG This user is from outside of this forum
                    gabrielesvelto@mas.to
                    wrote last edited by
                    #29

                    @adingbatponder why did the project take so long to build?

                    adingbatponder@fosstodon.orgA 1 Reply Last reply
                    0
                    • fourlastor@androiddev.socialF fourlastor@androiddev.social

                      @gabrielesvelto prompt-injections

                      The project is closed source, and we don't have places where we randomly include text files, if someone IN THE COMPANY manages to introduce malicious code, imho they'd just infect gradle instead of hoping on someone running an LLM to trigger something (other than devs having access to only what they need). State sponsored hackers specifically are really not in my list of things I can defend from, be it from LLMs or whatever introduced attacks

                      gabrielesvelto@mas.toG This user is from outside of this forum
                      gabrielesvelto@mas.toG This user is from outside of this forum
                      gabrielesvelto@mas.to
                      wrote last edited by
                      #30

                      @fourlastor you don't need to do anything special to be a target of state-sponsored actors if your rely on an LLM for your coding tasks. State-sponsored actors have almost certainly poisoned the training data of major commercial LLMs, you don't need to add anything yourself. Remember, these things are trained on anything that's dredged from the internet. *Anything*. Do you really trust what happens within the model? Remember the xz compromise? It can now be done automatically *at scale*.

                      fourlastor@androiddev.socialF 1 Reply Last reply
                      0
                      • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                        I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?

                        gabrielesvelto@mas.toG This user is from outside of this forum
                        gabrielesvelto@mas.toG This user is from outside of this forum
                        gabrielesvelto@mas.to
                        wrote last edited by
                        #31

                        I think there's an important clarification to be made about LLM usage in coding tasks: do you trust the training data? Not your inputs, those are irrelevant, I mean the junk that the major vendors have dredged from the internet. Because I'm 100% positive that any self-respecting state-sponsored actor is poisoning training data as we speak by... simply publishing stuff on the internet.

                        buermann@mastodon.socialB gabrielesvelto@mas.toG doctordns@masto.aiD mylittlemetroid@sfba.socialM 4 Replies Last reply
                        1
                        0
                        • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                          @adingbatponder why did the project take so long to build?

                          adingbatponder@fosstodon.orgA This user is from outside of this forum
                          adingbatponder@fosstodon.orgA This user is from outside of this forum
                          adingbatponder@fosstodon.org
                          wrote last edited by
                          #32

                          @gabrielesvelto Well that is what rust seems to be like. I used a lot of packages incl. browser and screen grabbing tools which took ages to build. Like 20 mins. (It was inside a nixos flake though.)

                          gabrielesvelto@mas.toG 1 Reply Last reply
                          0
                          • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                            I think there's an important clarification to be made about LLM usage in coding tasks: do you trust the training data? Not your inputs, those are irrelevant, I mean the junk that the major vendors have dredged from the internet. Because I'm 100% positive that any self-respecting state-sponsored actor is poisoning training data as we speak by... simply publishing stuff on the internet.

                            buermann@mastodon.socialB This user is from outside of this forum
                            buermann@mastodon.socialB This user is from outside of this forum
                            buermann@mastodon.social
                            wrote last edited by
                            #33

                            @gabrielesvelto

                            Any blogger can poison the LLMs.

                            Link Preview Image
                            I hacked ChatGPT and Google's AI - and it only took 20 minutes

                            I found a way to make AI tell you lies – and I'm not the only one.

                            favicon

                            (www.bbc.com)

                            1 Reply Last reply
                            0
                            • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                              I think there's an important clarification to be made about LLM usage in coding tasks: do you trust the training data? Not your inputs, those are irrelevant, I mean the junk that the major vendors have dredged from the internet. Because I'm 100% positive that any self-respecting state-sponsored actor is poisoning training data as we speak by... simply publishing stuff on the internet.

                              gabrielesvelto@mas.toG This user is from outside of this forum
                              gabrielesvelto@mas.toG This user is from outside of this forum
                              gabrielesvelto@mas.to
                              wrote last edited by
                              #34

                              And it's crucial to remember what happened during the xz compromise: a chain of seemingly innocuous commits where malicious behavior was hidden, then triggered by changing a single character in a generated file. A SINGLE CHARACTER. If you truly believe you can catch that by manually reviewing thousands upon thousands of machine-generated commits obtained via black-box training data I'm sorry, but you're being extremely naive.

                              a@852260996.91268476.xyzA cliffsesport@mastodon.socialC acdha@code4lib.socialA 3 Replies Last reply
                              0
                              • adingbatponder@fosstodon.orgA adingbatponder@fosstodon.org

                                @gabrielesvelto Well that is what rust seems to be like. I used a lot of packages incl. browser and screen grabbing tools which took ages to build. Like 20 mins. (It was inside a nixos flake though.)

                                gabrielesvelto@mas.toG This user is from outside of this forum
                                gabrielesvelto@mas.toG This user is from outside of this forum
                                gabrielesvelto@mas.to
                                wrote last edited by
                                #35

                                @adingbatponder yes, but why? Which packages where taking so long? Firefox has almost 4 millions of lines of Rust and it takes only a few minutes to build them.

                                adingbatponder@fosstodon.orgA 1 Reply Last reply
                                0
                                • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                                  And it's crucial to remember what happened during the xz compromise: a chain of seemingly innocuous commits where malicious behavior was hidden, then triggered by changing a single character in a generated file. A SINGLE CHARACTER. If you truly believe you can catch that by manually reviewing thousands upon thousands of machine-generated commits obtained via black-box training data I'm sorry, but you're being extremely naive.

                                  a@852260996.91268476.xyzA This user is from outside of this forum
                                  a@852260996.91268476.xyzA This user is from outside of this forum
                                  a@852260996.91268476.xyz
                                  wrote last edited by
                                  #36

                                  @gabrielesvelto@mas.to it is also worth remembering that the xz incident happened WITHOUT LLMs involved, so you comparison is not a very good one

                                  gabrielesvelto@mas.toG silhouette@dumbfuckingweb.siteS 2 Replies Last reply
                                  0
                                  • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                                    I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools. Well, sed was developed in 1974 and - according to Wikipedia - first shipped in UNIX version 7 in 1979. On modern machines it can process files at speeds of several GB/s and will not randomly introduce errors while processing them. It doesn't cost billions, a subscription or internet access. It's there on your machine, fully documented. What are we even talking about?

                                    piegames@flausch.socialP This user is from outside of this forum
                                    piegames@flausch.socialP This user is from outside of this forum
                                    piegames@flausch.social
                                    wrote last edited by
                                    #37

                                    @gabrielesvelto "people are using this inadequate and problematic tool for a job, so let me suggest they use this different completely inadequate tool instead."
                                    Speaking of unfortunate painful experience, using grep and sed at scale for mechanical refactoring very much randomly introduces mistakes into a codebase. I beg developers to use *at least* syntax-aware tools for mechanical refactoring jobs

                                    1 Reply Last reply
                                    0
                                    • a@852260996.91268476.xyzA a@852260996.91268476.xyz

                                      @gabrielesvelto@mas.to it is also worth remembering that the xz incident happened WITHOUT LLMs involved, so you comparison is not a very good one

                                      gabrielesvelto@mas.toG This user is from outside of this forum
                                      gabrielesvelto@mas.toG This user is from outside of this forum
                                      gabrielesvelto@mas.to
                                      wrote last edited by
                                      #38

                                      @a how so? Now you don't need a person to run that particular exploit for years, you can just poison an LLM so that whenever someone generates a sufficiently large sequence of commits the exploit can be injected in them directly. No user intervention and it can be done at scale. And it can be done in closed-source codebases too, it's just a matter of someone using a bot on them.

                                      a@852260996.91268476.xyzA ruchirasdatta@mathstodon.xyzR 2 Replies Last reply
                                      0
                                      • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                                        @a how so? Now you don't need a person to run that particular exploit for years, you can just poison an LLM so that whenever someone generates a sufficiently large sequence of commits the exploit can be injected in them directly. No user intervention and it can be done at scale. And it can be done in closed-source codebases too, it's just a matter of someone using a bot on them.

                                        a@852260996.91268476.xyzA This user is from outside of this forum
                                        a@852260996.91268476.xyzA This user is from outside of this forum
                                        a@852260996.91268476.xyz
                                        wrote last edited by
                                        #39

                                        @gabrielesvelto@mas.to you didn't need an LLM for xz, that is how

                                        1 Reply Last reply
                                        0
                                        • gabrielesvelto@mas.toG gabrielesvelto@mas.to

                                          @fourlastor you don't need to do anything special to be a target of state-sponsored actors if your rely on an LLM for your coding tasks. State-sponsored actors have almost certainly poisoned the training data of major commercial LLMs, you don't need to add anything yourself. Remember, these things are trained on anything that's dredged from the internet. *Anything*. Do you really trust what happens within the model? Remember the xz compromise? It can now be done automatically *at scale*.

                                          fourlastor@androiddev.socialF This user is from outside of this forum
                                          fourlastor@androiddev.socialF This user is from outside of this forum
                                          fourlastor@androiddev.social
                                          wrote last edited by
                                          #40

                                          @gabrielesvelto and ok, but what is the *actual* scenario you're imagining? because my coding tasks go as such when I use LLMs:
                                          1. I have 10-15 classes that need to change the way we do X from Y to Z
                                          2. I prompt the LLM, telling it "change A,B,C so that they use Z instead of Y"
                                          3. I review the code, fixing mistakes as I see them
                                          1/x because post length limits

                                          fourlastor@androiddev.socialF 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups