Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

Scheduled Pinned Locked Moved Uncategorized
140 Posts 61 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

    i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

    the "ideal" (their choice of words) case is 64.2%

    dakangaroo@mastodon.socialD This user is from outside of this forum
    dakangaroo@mastodon.socialD This user is from outside of this forum
    dakangaroo@mastodon.social
    wrote last edited by
    #67

    @whitequark But... why? Why not just use a linter?

    whitequark@social.treehouse.systemsW 1 Reply Last reply
    0
    • ireneista@adhd.irenes.spaceI ireneista@adhd.irenes.space

      @whitequark because "the thing we're promoting is incredibly dangerous, and not in fun ways" is not really the thing anyone wants to be cited for

      geoffwozniak@masto.hackers.townG This user is from outside of this forum
      geoffwozniak@masto.hackers.townG This user is from outside of this forum
      geoffwozniak@masto.hackers.town
      wrote last edited by
      #68

      @ireneista @whitequark Now, show me the numbers on the effort to make a rule-based style file compared to this. Because I'm sure that A_c is 100.0 in that case.

      Link Preview Image
      whitequark@social.treehouse.systemsW 1 Reply Last reply
      0
      • dakangaroo@mastodon.socialD dakangaroo@mastodon.social

        @whitequark But... why? Why not just use a linter?

        whitequark@social.treehouse.systemsW This user is from outside of this forum
        whitequark@social.treehouse.systemsW This user is from outside of this forum
        whitequark@social.treehouse.systems
        wrote last edited by
        #69

        @DaKangaroo see edit

        1 Reply Last reply
        0
        • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

          i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

          the "ideal" (their choice of words) case is 64.2%

          mirabilos@toot.mirbsd.orgM This user is from outside of this forum
          mirabilos@toot.mirbsd.orgM This user is from outside of this forum
          mirabilos@toot.mirbsd.org
          wrote last edited by
          #70

          @whitequark I cannot even

          1 Reply Last reply
          0
          • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

            @porglezomp you'll love Fig. 6

            mntmn@mastodon.socialM This user is from outside of this forum
            mntmn@mastodon.socialM This user is from outside of this forum
            mntmn@mastodon.social
            wrote last edited by
            #71

            @whitequark @porglezomp long live the new flesh

            1 Reply Last reply
            0
            • geoffwozniak@masto.hackers.townG geoffwozniak@masto.hackers.town

              @ireneista @whitequark Now, show me the numbers on the effort to make a rule-based style file compared to this. Because I'm sure that A_c is 100.0 in that case.

              Link Preview Image
              whitequark@social.treehouse.systemsW This user is from outside of this forum
              whitequark@social.treehouse.systemsW This user is from outside of this forum
              whitequark@social.treehouse.systems
              wrote last edited by
              #72

              @GeoffWozniak @ireneista so the problem i'm solving is that while for C++, you have tools like clang-format which are nice and flexible, for Rust you have rustfmt which is rigid and makes your code look like ass. I do not like my code looking like ass but I am also receptive to the idea that introducing as many knobs as clang-format has into rustfmt would make it unmaintainable

              geoffwozniak@masto.hackers.townG 1 Reply Last reply
              0
              • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

                the "ideal" (their choice of words) case is 64.2%

                tunafishtiger@mastodon.onlineT This user is from outside of this forum
                tunafishtiger@mastodon.onlineT This user is from outside of this forum
                tunafishtiger@mastodon.online
                wrote last edited by
                #73

                @whitequark this technology is going to be amazing for the competitive advantage of the few software firms that refuse to use it

                1 Reply Last reply
                0
                • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                  i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

                  the "ideal" (their choice of words) case is 64.2%

                  dalias@hachyderm.ioD This user is from outside of this forum
                  dalias@hachyderm.ioD This user is from outside of this forum
                  dalias@hachyderm.io
                  wrote last edited by
                  #74

                  @whitequark Saw your edit with the motivation for reading research. I doubt there's anything out there doing this well, but I think the smart approach to doing it well would be to evaluate and score a bunch of candidate standard-class rules across the codebase, solve for a set that maximally approximates what's already there, then apply some sort of pattern learning for the remaining instances that "break the rules", hopefully identifying correlations between them.

                  Basically, going as far as you can with simple comprehensible deterministic rules before you start throwing magical statistics at it.

                  whitequark@social.treehouse.systemsW 1 Reply Last reply
                  0
                  • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                    @GeoffWozniak @ireneista so the problem i'm solving is that while for C++, you have tools like clang-format which are nice and flexible, for Rust you have rustfmt which is rigid and makes your code look like ass. I do not like my code looking like ass but I am also receptive to the idea that introducing as many knobs as clang-format has into rustfmt would make it unmaintainable

                    geoffwozniak@masto.hackers.townG This user is from outside of this forum
                    geoffwozniak@masto.hackers.townG This user is from outside of this forum
                    geoffwozniak@masto.hackers.town
                    wrote last edited by
                    #75

                    @whitequark @ireneista I have not had to deal with rustfmt yet. For clang-format, I work in existing projects and use (very) mildly tweaked variants of the base style for the project.

                    At the risk of instigating the canonical bikeshed discussion, I am a conformist formatter and have not concerned myself with modifying style all that much. But I agree that clang-format has some bizarre knobs to tweak.

                    whitequark@social.treehouse.systemsW 1 Reply Last reply
                    0
                    • dalias@hachyderm.ioD dalias@hachyderm.io

                      @whitequark Saw your edit with the motivation for reading research. I doubt there's anything out there doing this well, but I think the smart approach to doing it well would be to evaluate and score a bunch of candidate standard-class rules across the codebase, solve for a set that maximally approximates what's already there, then apply some sort of pattern learning for the remaining instances that "break the rules", hopefully identifying correlations between them.

                      Basically, going as far as you can with simple comprehensible deterministic rules before you start throwing magical statistics at it.

                      whitequark@social.treehouse.systemsW This user is from outside of this forum
                      whitequark@social.treehouse.systemsW This user is from outside of this forum
                      whitequark@social.treehouse.systems
                      wrote last edited by
                      #76

                      @dalias i specifically do not want this because of two reasons:

                      1. it requires software that doesn't exist (e.g. there are no Rust formatters that expose enough deterministic knobs for me)
                      2. it doesn't resolve the rigidity of the underlying formatter

                      there is existing research doing the thing you're talking about here, which you could probably use as-is to achieve what you want (it even has an explainer tool for the rules it generates—note I haven't tried it, just read the abstract); I want the formatter to be somewhat liberal about the code it accepts. whether I think the code should be formatted a certain way (as a maintainer) is non-deterministic, so I see no real issue with the statistical model having chaotic-but-deterministic behavior in some cases as long as overall the behavior is reasonable

                      whitequark@social.treehouse.systemsW dalias@hachyderm.ioD 2 Replies Last reply
                      0
                      • geoffwozniak@masto.hackers.townG geoffwozniak@masto.hackers.town

                        @whitequark @ireneista I have not had to deal with rustfmt yet. For clang-format, I work in existing projects and use (very) mildly tweaked variants of the base style for the project.

                        At the risk of instigating the canonical bikeshed discussion, I am a conformist formatter and have not concerned myself with modifying style all that much. But I agree that clang-format has some bizarre knobs to tweak.

                        whitequark@social.treehouse.systemsW This user is from outside of this forum
                        whitequark@social.treehouse.systemsW This user is from outside of this forum
                        whitequark@social.treehouse.systems
                        wrote last edited by
                        #77

                        @GeoffWozniak @ireneista I view code as art so I find strongly canonicalizing formatters like black to be actively destructive. right now I use Ruff with a 300-line configuration for some of the Python code and I think there's gotta be a better way to approach this that isn't destructive

                        ireneista@adhd.irenes.spaceI geoffwozniak@masto.hackers.townG 2 Replies Last reply
                        0
                        • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                          @dalias i specifically do not want this because of two reasons:

                          1. it requires software that doesn't exist (e.g. there are no Rust formatters that expose enough deterministic knobs for me)
                          2. it doesn't resolve the rigidity of the underlying formatter

                          there is existing research doing the thing you're talking about here, which you could probably use as-is to achieve what you want (it even has an explainer tool for the rules it generates—note I haven't tried it, just read the abstract); I want the formatter to be somewhat liberal about the code it accepts. whether I think the code should be formatted a certain way (as a maintainer) is non-deterministic, so I see no real issue with the statistical model having chaotic-but-deterministic behavior in some cases as long as overall the behavior is reasonable

                          whitequark@social.treehouse.systemsW This user is from outside of this forum
                          whitequark@social.treehouse.systemsW This user is from outside of this forum
                          whitequark@social.treehouse.systems
                          wrote last edited by
                          #78

                          @dalias the problem this is solving is that some contributors have an allergic reaction to getting "please format this in <X way>" review comments, so having a tool that gets a patch 95% to the way it 'ought' to be should lower friction in much the same way that adopting a strongly canonicalizing formatter like black would, without downsides of the latter

                          1 Reply Last reply
                          0
                          • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                            @dalias i specifically do not want this because of two reasons:

                            1. it requires software that doesn't exist (e.g. there are no Rust formatters that expose enough deterministic knobs for me)
                            2. it doesn't resolve the rigidity of the underlying formatter

                            there is existing research doing the thing you're talking about here, which you could probably use as-is to achieve what you want (it even has an explainer tool for the rules it generates—note I haven't tried it, just read the abstract); I want the formatter to be somewhat liberal about the code it accepts. whether I think the code should be formatted a certain way (as a maintainer) is non-deterministic, so I see no real issue with the statistical model having chaotic-but-deterministic behavior in some cases as long as overall the behavior is reasonable

                            dalias@hachyderm.ioD This user is from outside of this forum
                            dalias@hachyderm.ioD This user is from outside of this forum
                            dalias@hachyderm.io
                            wrote last edited by
                            #79

                            @whitequark I didn't mean rejecting code that's not formatted "right" according to a deterministing formatter. I meant evaluaring how closely each of a set of candidate deterministic rules is followed by the code whose style you want to mimic, in order to determine a set of deterministic rules that get you close, then build a model for the exceptions to those rules.

                            It's not just that I think this would have the biggest chance of success, but also that it mimics the thought process I'd go through for formatting code by hand where there are general principles I have in mind but I'm happy to break the rules whenever doing something different would make it more readable, easier to work with, or whatever.

                            Indeed however I doubt there is research on this or sufficient prerequisite tooling to make it easy.

                            whitequark@social.treehouse.systemsW 1 Reply Last reply
                            0
                            • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                              @GeoffWozniak @ireneista I view code as art so I find strongly canonicalizing formatters like black to be actively destructive. right now I use Ruff with a 300-line configuration for some of the Python code and I think there's gotta be a better way to approach this that isn't destructive

                              ireneista@adhd.irenes.spaceI This user is from outside of this forum
                              ireneista@adhd.irenes.spaceI This user is from outside of this forum
                              ireneista@adhd.irenes.space
                              wrote last edited by
                              #80

                              @whitequark @GeoffWozniak that's our view as well

                              whitequark@social.treehouse.systemsW 1 Reply Last reply
                              0
                              • dalias@hachyderm.ioD dalias@hachyderm.io

                                @whitequark I didn't mean rejecting code that's not formatted "right" according to a deterministing formatter. I meant evaluaring how closely each of a set of candidate deterministic rules is followed by the code whose style you want to mimic, in order to determine a set of deterministic rules that get you close, then build a model for the exceptions to those rules.

                                It's not just that I think this would have the biggest chance of success, but also that it mimics the thought process I'd go through for formatting code by hand where there are general principles I have in mind but I'm happy to break the rules whenever doing something different would make it more readable, easier to work with, or whatever.

                                Indeed however I doubt there is research on this or sufficient prerequisite tooling to make it easy.

                                whitequark@social.treehouse.systemsW This user is from outside of this forum
                                whitequark@social.treehouse.systemsW This user is from outside of this forum
                                whitequark@social.treehouse.systems
                                wrote last edited by
                                #81

                                @dalias I feel like building a difference model is a much more difficult approach to pursue while still exhibiting the undesirable chaotic behavior in some edge cases. anyway, time will tell if this works the way we want to build it or not

                                1 Reply Last reply
                                0
                                • ireneista@adhd.irenes.spaceI ireneista@adhd.irenes.space

                                  @whitequark @GeoffWozniak that's our view as well

                                  whitequark@social.treehouse.systemsW This user is from outside of this forum
                                  whitequark@social.treehouse.systemsW This user is from outside of this forum
                                  whitequark@social.treehouse.systems
                                  wrote last edited by
                                  #82

                                  @ireneista @GeoffWozniak based on a discussion with someone who has worked on this problem before we want to try building a diffusion model that captures the whitespace between code tokens and is then able to inject it into a given parsetree, which appears to be a fairly efficient and unproblematic way to do this

                                  whitequark@social.treehouse.systemsW kouhai@social.treehouse.systemsK 2 Replies Last reply
                                  0
                                  • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                                    @ireneista @GeoffWozniak based on a discussion with someone who has worked on this problem before we want to try building a diffusion model that captures the whitespace between code tokens and is then able to inject it into a given parsetree, which appears to be a fairly efficient and unproblematic way to do this

                                    whitequark@social.treehouse.systemsW This user is from outside of this forum
                                    whitequark@social.treehouse.systemsW This user is from outside of this forum
                                    whitequark@social.treehouse.systems
                                    wrote last edited by
                                    #83

                                    @ireneista @GeoffWozniak and everything that is best done on a parsetree (import ordering for example) will be done in the parsetree because it ain't broken

                                    ireneista@adhd.irenes.spaceI geoffwozniak@masto.hackers.townG 2 Replies Last reply
                                    0
                                    • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                                      @ireneista @GeoffWozniak and everything that is best done on a parsetree (import ordering for example) will be done in the parsetree because it ain't broken

                                      ireneista@adhd.irenes.spaceI This user is from outside of this forum
                                      ireneista@adhd.irenes.spaceI This user is from outside of this forum
                                      ireneista@adhd.irenes.space
                                      wrote last edited by
                                      #84

                                      @whitequark @GeoffWozniak yeah this is a recurring research topic for us, we've talked with several of our friends about it over the years. just making a parser/generator that properly round-trip whitespace and comments is already a ton of work, alas...

                                      whitequark@social.treehouse.systemsW 1 Reply Last reply
                                      0
                                      • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                                        @ireneista @GeoffWozniak and everything that is best done on a parsetree (import ordering for example) will be done in the parsetree because it ain't broken

                                        geoffwozniak@masto.hackers.townG This user is from outside of this forum
                                        geoffwozniak@masto.hackers.townG This user is from outside of this forum
                                        geoffwozniak@masto.hackers.town
                                        wrote last edited by
                                        #85

                                        @whitequark @ireneista This sounds a lot like XSLT (or XSLT-adjacent).

                                        1 Reply Last reply
                                        0
                                        • ireneista@adhd.irenes.spaceI ireneista@adhd.irenes.space

                                          @whitequark @GeoffWozniak yeah this is a recurring research topic for us, we've talked with several of our friends about it over the years. just making a parser/generator that properly round-trip whitespace and comments is already a ton of work, alas...

                                          whitequark@social.treehouse.systemsW This user is from outside of this forum
                                          whitequark@social.treehouse.systemsW This user is from outside of this forum
                                          whitequark@social.treehouse.systems
                                          wrote last edited by
                                          #86

                                          @ireneista @GeoffWozniak there's tree-sitter nowadays which I believe should do that (and I think it should be failure-tolerant considering its fairly wide use in editors: nvim, zed, etc)

                                          whitequark@social.treehouse.systemsW 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups