i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

whitequark@social.treehouse.systems

@dalias the problem this is solving is that some contributors have an allergic reaction to getting "please format this in <X way>" review comments, so having a tool that gets a patch 95% to the way it 'ought' to be should lower friction in much the same way that adopting a strongly canonicalizing formatter like black would, without downsides of the latter

dalias@hachyderm.io

@whitequark I didn't mean rejecting code that's not formatted "right" according to a deterministing formatter. I meant evaluaring how closely each of a set of candidate deterministic rules is followed by the code whose style you want to mimic, in order to determine a set of deterministic rules that get you close, then build a model for the exceptions to those rules.

It's not just that I think this would have the biggest chance of success, but also that it mimics the thought process I'd go through for formatting code by hand where there are general principles I have in mind but I'm happy to break the rules whenever doing something different would make it more readable, easier to work with, or whatever.

Indeed however I doubt there is research on this or sufficient prerequisite tooling to make it easy.

ireneista@adhd.irenes.space

@whitequark @GeoffWozniak that's our view as well

whitequark@social.treehouse.systems

@dalias I feel like building a difference model is a much more difficult approach to pursue while still exhibiting the undesirable chaotic behavior in some edge cases. anyway, time will tell if this works the way we want to build it or not

whitequark@social.treehouse.systems

@ireneista @GeoffWozniak based on a discussion with someone who has worked on this problem before we want to try building a diffusion model that captures the whitespace between code tokens and is then able to inject it into a given parsetree, which appears to be a fairly efficient and unproblematic way to do this

whitequark@social.treehouse.systems

@ireneista @GeoffWozniak and everything that is best done on a parsetree (import ordering for example) will be done in the parsetree because it ain't broken

ireneista@adhd.irenes.space

@whitequark @GeoffWozniak yeah this is a recurring research topic for us, we've talked with several of our friends about it over the years. just making a parser/generator that properly round-trip whitespace and comments is already a ton of work, alas...

geoffwozniak@masto.hackers.town

@whitequark @ireneista This sounds a lot like XSLT (or XSLT-adjacent).

whitequark@social.treehouse.systems

@ireneista @GeoffWozniak there's tree-sitter nowadays which I believe should do that (and I think it should be failure-tolerant considering its fairly wide use in editors: nvim, zed, etc)

whitequark@social.treehouse.systems

@ireneista @GeoffWozniak my literal first Python project was making a Python parser that fully captures source spans (which wasn't upstream at the time--in 2014 or so), so i'm quite familiar with the topic by now

geoffwozniak@masto.hackers.town

@whitequark @ireneista I very much respect that.

I view code like writing and I will tweak structure and form for far too long sometimes. Layout ends up getting less of my attention.

whitequark@social.treehouse.systems

@GeoffWozniak @ireneista I see layout as part of the form, I guess? I write source code files in much the same way as one would write chapters in a book: somewhat self-contained, and intended to make sense when read top-to-bottom linearly and with roughly one full-displayful of contex. so if rustfmt decides to blow up a function call into 20 lines out of nowhere it very much messes with that, for example

geoffwozniak@masto.hackers.town

@whitequark @ireneista Well, I do have limits.

In my case I spend my time in Binutils and GCC. Do I love the GNU style? No. But does consistency help? Yes. So I demur. But I will restructure things so the single line curly braces don't take over.

whitequark@social.treehouse.systems

@GeoffWozniak @ireneista the awful code style is probably #2 in the list of top 5 reasons I contribute to LLVM instead of GNU tools. I should use it as a testcase for the tool I'm working on, actually

whitequark@social.treehouse.systems

@GeoffWozniak @ireneista awful memories of chasing down a bug in or1k binutils where .got section got somehow slightly unaligned from _GLOBAL_OFFSET_TABLE_. I never figured it out; I have since quit the company and I will mercifully never have to think about or1k again

geoffwozniak@masto.hackers.town

@whitequark @ireneista I've grown used to it. That may say something bad about me, but it keeps me employed.

However, I never use it as a style in anything else, though.

geoffwozniak@masto.hackers.town

@whitequark @ireneista I was in this wonderousness today, used in one of those functions that is a few hundred lines long with nested case statements and no attempt at functional abstraction.

So perhaps I have lost any hope of making art.

sourceware.org Git - binutils-gdb.git/blob - bfd/elf-bfd.h

(sourceware.org)

whitequark@social.treehouse.systems

@GeoffWozniak @ireneista yeah I mean I've submitted binutils patches while I was employed there, and for all the dislike I have for that code style it was so far down the list of bad things about that job that it didn't even register

whitequark@social.treehouse.systems

@GeoffWozniak @ireneista yeah I have regretfully seen libbfd

netraven@hear-me.social

@whitequark do the thing. Science the shit out of it.

CIRCLE WITH A DOT

i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

sourceware.org Git - binutils-gdb.git/blob - bfd/elf-bfd.h

sourceware.org Git - binutils-gdb.git/blob - bfd/elf-bfd.h