Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

Scheduled Pinned Locked Moved Uncategorized
140 Posts 61 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

    @dalias i specifically do not want this because of two reasons:

    1. it requires software that doesn't exist (e.g. there are no Rust formatters that expose enough deterministic knobs for me)
    2. it doesn't resolve the rigidity of the underlying formatter

    there is existing research doing the thing you're talking about here, which you could probably use as-is to achieve what you want (it even has an explainer tool for the rules it generates—note I haven't tried it, just read the abstract); I want the formatter to be somewhat liberal about the code it accepts. whether I think the code should be formatted a certain way (as a maintainer) is non-deterministic, so I see no real issue with the statistical model having chaotic-but-deterministic behavior in some cases as long as overall the behavior is reasonable

    dalias@hachyderm.ioD This user is from outside of this forum
    dalias@hachyderm.ioD This user is from outside of this forum
    dalias@hachyderm.io
    wrote last edited by
    #79

    @whitequark I didn't mean rejecting code that's not formatted "right" according to a deterministing formatter. I meant evaluaring how closely each of a set of candidate deterministic rules is followed by the code whose style you want to mimic, in order to determine a set of deterministic rules that get you close, then build a model for the exceptions to those rules.

    It's not just that I think this would have the biggest chance of success, but also that it mimics the thought process I'd go through for formatting code by hand where there are general principles I have in mind but I'm happy to break the rules whenever doing something different would make it more readable, easier to work with, or whatever.

    Indeed however I doubt there is research on this or sufficient prerequisite tooling to make it easy.

    whitequark@social.treehouse.systemsW 1 Reply Last reply
    0
    • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

      @GeoffWozniak @ireneista I view code as art so I find strongly canonicalizing formatters like black to be actively destructive. right now I use Ruff with a 300-line configuration for some of the Python code and I think there's gotta be a better way to approach this that isn't destructive

      ireneista@adhd.irenes.spaceI This user is from outside of this forum
      ireneista@adhd.irenes.spaceI This user is from outside of this forum
      ireneista@adhd.irenes.space
      wrote last edited by
      #80

      @whitequark @GeoffWozniak that's our view as well

      whitequark@social.treehouse.systemsW 1 Reply Last reply
      0
      • dalias@hachyderm.ioD dalias@hachyderm.io

        @whitequark I didn't mean rejecting code that's not formatted "right" according to a deterministing formatter. I meant evaluaring how closely each of a set of candidate deterministic rules is followed by the code whose style you want to mimic, in order to determine a set of deterministic rules that get you close, then build a model for the exceptions to those rules.

        It's not just that I think this would have the biggest chance of success, but also that it mimics the thought process I'd go through for formatting code by hand where there are general principles I have in mind but I'm happy to break the rules whenever doing something different would make it more readable, easier to work with, or whatever.

        Indeed however I doubt there is research on this or sufficient prerequisite tooling to make it easy.

        whitequark@social.treehouse.systemsW This user is from outside of this forum
        whitequark@social.treehouse.systemsW This user is from outside of this forum
        whitequark@social.treehouse.systems
        wrote last edited by
        #81

        @dalias I feel like building a difference model is a much more difficult approach to pursue while still exhibiting the undesirable chaotic behavior in some edge cases. anyway, time will tell if this works the way we want to build it or not

        1 Reply Last reply
        0
        • ireneista@adhd.irenes.spaceI ireneista@adhd.irenes.space

          @whitequark @GeoffWozniak that's our view as well

          whitequark@social.treehouse.systemsW This user is from outside of this forum
          whitequark@social.treehouse.systemsW This user is from outside of this forum
          whitequark@social.treehouse.systems
          wrote last edited by
          #82

          @ireneista @GeoffWozniak based on a discussion with someone who has worked on this problem before we want to try building a diffusion model that captures the whitespace between code tokens and is then able to inject it into a given parsetree, which appears to be a fairly efficient and unproblematic way to do this

          whitequark@social.treehouse.systemsW kouhai@social.treehouse.systemsK 2 Replies Last reply
          0
          • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

            @ireneista @GeoffWozniak based on a discussion with someone who has worked on this problem before we want to try building a diffusion model that captures the whitespace between code tokens and is then able to inject it into a given parsetree, which appears to be a fairly efficient and unproblematic way to do this

            whitequark@social.treehouse.systemsW This user is from outside of this forum
            whitequark@social.treehouse.systemsW This user is from outside of this forum
            whitequark@social.treehouse.systems
            wrote last edited by
            #83

            @ireneista @GeoffWozniak and everything that is best done on a parsetree (import ordering for example) will be done in the parsetree because it ain't broken

            ireneista@adhd.irenes.spaceI geoffwozniak@masto.hackers.townG 2 Replies Last reply
            0
            • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

              @ireneista @GeoffWozniak and everything that is best done on a parsetree (import ordering for example) will be done in the parsetree because it ain't broken

              ireneista@adhd.irenes.spaceI This user is from outside of this forum
              ireneista@adhd.irenes.spaceI This user is from outside of this forum
              ireneista@adhd.irenes.space
              wrote last edited by
              #84

              @whitequark @GeoffWozniak yeah this is a recurring research topic for us, we've talked with several of our friends about it over the years. just making a parser/generator that properly round-trip whitespace and comments is already a ton of work, alas...

              whitequark@social.treehouse.systemsW 1 Reply Last reply
              0
              • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                @ireneista @GeoffWozniak and everything that is best done on a parsetree (import ordering for example) will be done in the parsetree because it ain't broken

                geoffwozniak@masto.hackers.townG This user is from outside of this forum
                geoffwozniak@masto.hackers.townG This user is from outside of this forum
                geoffwozniak@masto.hackers.town
                wrote last edited by
                #85

                @whitequark @ireneista This sounds a lot like XSLT (or XSLT-adjacent).

                1 Reply Last reply
                0
                • ireneista@adhd.irenes.spaceI ireneista@adhd.irenes.space

                  @whitequark @GeoffWozniak yeah this is a recurring research topic for us, we've talked with several of our friends about it over the years. just making a parser/generator that properly round-trip whitespace and comments is already a ton of work, alas...

                  whitequark@social.treehouse.systemsW This user is from outside of this forum
                  whitequark@social.treehouse.systemsW This user is from outside of this forum
                  whitequark@social.treehouse.systems
                  wrote last edited by
                  #86

                  @ireneista @GeoffWozniak there's tree-sitter nowadays which I believe should do that (and I think it should be failure-tolerant considering its fairly wide use in editors: nvim, zed, etc)

                  whitequark@social.treehouse.systemsW 1 Reply Last reply
                  0
                  • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                    @ireneista @GeoffWozniak there's tree-sitter nowadays which I believe should do that (and I think it should be failure-tolerant considering its fairly wide use in editors: nvim, zed, etc)

                    whitequark@social.treehouse.systemsW This user is from outside of this forum
                    whitequark@social.treehouse.systemsW This user is from outside of this forum
                    whitequark@social.treehouse.systems
                    wrote last edited by
                    #87

                    @ireneista @GeoffWozniak my literal first Python project was making a Python parser that fully captures source spans (which wasn't upstream at the time--in 2014 or so), so i'm quite familiar with the topic by now 😛

                    1 Reply Last reply
                    0
                    • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                      @GeoffWozniak @ireneista I view code as art so I find strongly canonicalizing formatters like black to be actively destructive. right now I use Ruff with a 300-line configuration for some of the Python code and I think there's gotta be a better way to approach this that isn't destructive

                      geoffwozniak@masto.hackers.townG This user is from outside of this forum
                      geoffwozniak@masto.hackers.townG This user is from outside of this forum
                      geoffwozniak@masto.hackers.town
                      wrote last edited by
                      #88

                      @whitequark @ireneista I very much respect that.

                      I view code like writing and I will tweak structure and form for far too long sometimes. Layout ends up getting less of my attention.

                      whitequark@social.treehouse.systemsW 1 Reply Last reply
                      0
                      • geoffwozniak@masto.hackers.townG geoffwozniak@masto.hackers.town

                        @whitequark @ireneista I very much respect that.

                        I view code like writing and I will tweak structure and form for far too long sometimes. Layout ends up getting less of my attention.

                        whitequark@social.treehouse.systemsW This user is from outside of this forum
                        whitequark@social.treehouse.systemsW This user is from outside of this forum
                        whitequark@social.treehouse.systems
                        wrote last edited by
                        #89

                        @GeoffWozniak @ireneista I see layout as part of the form, I guess? I write source code files in much the same way as one would write chapters in a book: somewhat self-contained, and intended to make sense when read top-to-bottom linearly and with roughly one full-displayful of contex. so if rustfmt decides to blow up a function call into 20 lines out of nowhere it very much messes with that, for example

                        geoffwozniak@masto.hackers.townG 1 Reply Last reply
                        0
                        • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                          @GeoffWozniak @ireneista I see layout as part of the form, I guess? I write source code files in much the same way as one would write chapters in a book: somewhat self-contained, and intended to make sense when read top-to-bottom linearly and with roughly one full-displayful of contex. so if rustfmt decides to blow up a function call into 20 lines out of nowhere it very much messes with that, for example

                          geoffwozniak@masto.hackers.townG This user is from outside of this forum
                          geoffwozniak@masto.hackers.townG This user is from outside of this forum
                          geoffwozniak@masto.hackers.town
                          wrote last edited by
                          #90

                          @whitequark @ireneista Well, I do have limits.

                          In my case I spend my time in Binutils and GCC. Do I love the GNU style? No. But does consistency help? Yes. So I demur. But I will restructure things so the single line curly braces don't take over.

                          whitequark@social.treehouse.systemsW 1 Reply Last reply
                          0
                          • geoffwozniak@masto.hackers.townG geoffwozniak@masto.hackers.town

                            @whitequark @ireneista Well, I do have limits.

                            In my case I spend my time in Binutils and GCC. Do I love the GNU style? No. But does consistency help? Yes. So I demur. But I will restructure things so the single line curly braces don't take over.

                            whitequark@social.treehouse.systemsW This user is from outside of this forum
                            whitequark@social.treehouse.systemsW This user is from outside of this forum
                            whitequark@social.treehouse.systems
                            wrote last edited by
                            #91

                            @GeoffWozniak @ireneista the awful code style is probably #2 in the list of top 5 reasons I contribute to LLVM instead of GNU tools. I should use it as a testcase for the tool I'm working on, actually

                            whitequark@social.treehouse.systemsW geoffwozniak@masto.hackers.townG 2 Replies Last reply
                            0
                            • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                              @GeoffWozniak @ireneista the awful code style is probably #2 in the list of top 5 reasons I contribute to LLVM instead of GNU tools. I should use it as a testcase for the tool I'm working on, actually

                              whitequark@social.treehouse.systemsW This user is from outside of this forum
                              whitequark@social.treehouse.systemsW This user is from outside of this forum
                              whitequark@social.treehouse.systems
                              wrote last edited by
                              #92

                              @GeoffWozniak @ireneista awful memories of chasing down a bug in or1k binutils where .got section got somehow slightly unaligned from _GLOBAL_OFFSET_TABLE_. I never figured it out; I have since quit the company and I will mercifully never have to think about or1k again

                              geoffwozniak@masto.hackers.townG 1 Reply Last reply
                              0
                              • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                                @GeoffWozniak @ireneista the awful code style is probably #2 in the list of top 5 reasons I contribute to LLVM instead of GNU tools. I should use it as a testcase for the tool I'm working on, actually

                                geoffwozniak@masto.hackers.townG This user is from outside of this forum
                                geoffwozniak@masto.hackers.townG This user is from outside of this forum
                                geoffwozniak@masto.hackers.town
                                wrote last edited by
                                #93

                                @whitequark @ireneista I've grown used to it. That may say something bad about me, but it keeps me employed.

                                However, I never use it as a style in anything else, though.

                                whitequark@social.treehouse.systemsW 1 Reply Last reply
                                0
                                • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                                  @GeoffWozniak @ireneista awful memories of chasing down a bug in or1k binutils where .got section got somehow slightly unaligned from _GLOBAL_OFFSET_TABLE_. I never figured it out; I have since quit the company and I will mercifully never have to think about or1k again

                                  geoffwozniak@masto.hackers.townG This user is from outside of this forum
                                  geoffwozniak@masto.hackers.townG This user is from outside of this forum
                                  geoffwozniak@masto.hackers.town
                                  wrote last edited by
                                  #94

                                  @whitequark @ireneista I was in this wonderousness today, used in one of those functions that is a few hundred lines long with nested case statements and no attempt at functional abstraction.

                                  So perhaps I have lost any hope of making art.

                                  Link Preview Image
                                  sourceware.org Git - binutils-gdb.git/blob - bfd/elf-bfd.h

                                  favicon

                                  (sourceware.org)

                                  whitequark@social.treehouse.systemsW 1 Reply Last reply
                                  0
                                  • geoffwozniak@masto.hackers.townG geoffwozniak@masto.hackers.town

                                    @whitequark @ireneista I've grown used to it. That may say something bad about me, but it keeps me employed.

                                    However, I never use it as a style in anything else, though.

                                    whitequark@social.treehouse.systemsW This user is from outside of this forum
                                    whitequark@social.treehouse.systemsW This user is from outside of this forum
                                    whitequark@social.treehouse.systems
                                    wrote last edited by
                                    #95

                                    @GeoffWozniak @ireneista yeah I mean I've submitted binutils patches while I was employed there, and for all the dislike I have for that code style it was so far down the list of bad things about that job that it didn't even register

                                    1 Reply Last reply
                                    0
                                    • geoffwozniak@masto.hackers.townG geoffwozniak@masto.hackers.town

                                      @whitequark @ireneista I was in this wonderousness today, used in one of those functions that is a few hundred lines long with nested case statements and no attempt at functional abstraction.

                                      So perhaps I have lost any hope of making art.

                                      Link Preview Image
                                      sourceware.org Git - binutils-gdb.git/blob - bfd/elf-bfd.h

                                      favicon

                                      (sourceware.org)

                                      whitequark@social.treehouse.systemsW This user is from outside of this forum
                                      whitequark@social.treehouse.systemsW This user is from outside of this forum
                                      whitequark@social.treehouse.systems
                                      wrote last edited by
                                      #96

                                      @GeoffWozniak @ireneista yeah I have regretfully seen libbfd

                                      geoffwozniak@masto.hackers.townG 1 Reply Last reply
                                      0
                                      • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                                        i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

                                        the "ideal" (their choice of words) case is 64.2%

                                        netraven@hear-me.socialN This user is from outside of this forum
                                        netraven@hear-me.socialN This user is from outside of this forum
                                        netraven@hear-me.social
                                        wrote last edited by
                                        #97

                                        @whitequark do the thing. Science the shit out of it.

                                        1 Reply Last reply
                                        0
                                        • whitequark@social.treehouse.systemsW whitequark@social.treehouse.systems

                                          i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

                                          the "ideal" (their choice of words) case is 64.2%

                                          lizardbill@hachyderm.ioL This user is from outside of this forum
                                          lizardbill@hachyderm.ioL This user is from outside of this forum
                                          lizardbill@hachyderm.io
                                          wrote last edited by
                                          #98

                                          @whitequark 64.2% of the time, it works every time!

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups