i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original
-
@budududuroiu yes yes i know you're here because you look at trends and start arguments, now move on to something else and stop wasting my time
@whitequark lmao, have fun "clowning" on stuff you don't understand
-
@whitequark lmao, have fun "clowning" on stuff you don't understand
@budududuroiu go take a short walk off a long pier
-
@porglezomp you'll love Fig. 6
@whitequark "If" right next to "if"
-
i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original
the "ideal" (their choice of words) case is 64.2%
@whitequark if I have understood you correctly, they're saying 64% functional is a satisfactory result?
-
@whitequark if I have understood you correctly, they're saying 64% functional is a satisfactory result?
@FibroJedi that's my read of it yeah
-
i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original
the "ideal" (their choice of words) case is 64.2%
@whitequark Whenever I hear about these benchmarks I can't help but wonder how people can say these things with a straight face. -
@FibroJedi that's my read of it yeah
@whitequark Maybe they'd like their phone and car 64% functional as a real world test
.Some of those logic misses/switches are disturbing. I don't know how it's allowable.
If the code works 100%, and "reformatting" it reduces that % then it's wrong by definition.
-
@whitequark @porglezomp I'm spitting out my drink at j++ → j--. Holy shit.
@xgranade @whitequark @porglezomp
I think reversing the `j` for loop is actually wanted by them? It's labelled "ground truth", and it is a potential valid optimisation -
i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original
the "ideal" (their choice of words) case is 64.2%
@whitequark But... why? Why not just use a linter?
-
@whitequark because "the thing we're promoting is incredibly dangerous, and not in fun ways" is not really the thing anyone wants to be cited for
@ireneista @whitequark Now, show me the numbers on the effort to make a rule-based style file compared to this. Because I'm sure that A_c is 100.0 in that case.

-
@whitequark But... why? Why not just use a linter?
@DaKangaroo see edit
-
i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original
the "ideal" (their choice of words) case is 64.2%
@whitequark I cannot even
-
@porglezomp you'll love Fig. 6
@whitequark @porglezomp long live the new flesh
-
@ireneista @whitequark Now, show me the numbers on the effort to make a rule-based style file compared to this. Because I'm sure that A_c is 100.0 in that case.

@GeoffWozniak @ireneista so the problem i'm solving is that while for C++, you have tools like clang-format which are nice and flexible, for Rust you have rustfmt which is rigid and makes your code look like ass. I do not like my code looking like ass but I am also receptive to the idea that introducing as many knobs as clang-format has into rustfmt would make it unmaintainable
-
i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original
the "ideal" (their choice of words) case is 64.2%
@whitequark this technology is going to be amazing for the competitive advantage of the few software firms that refuse to use it
-
i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original
the "ideal" (their choice of words) case is 64.2%
@whitequark Saw your edit with the motivation for reading research. I doubt there's anything out there doing this well, but I think the smart approach to doing it well would be to evaluate and score a bunch of candidate standard-class rules across the codebase, solve for a set that maximally approximates what's already there, then apply some sort of pattern learning for the remaining instances that "break the rules", hopefully identifying correlations between them.
Basically, going as far as you can with simple comprehensible deterministic rules before you start throwing magical statistics at it.
-
@GeoffWozniak @ireneista so the problem i'm solving is that while for C++, you have tools like clang-format which are nice and flexible, for Rust you have rustfmt which is rigid and makes your code look like ass. I do not like my code looking like ass but I am also receptive to the idea that introducing as many knobs as clang-format has into rustfmt would make it unmaintainable
@whitequark @ireneista I have not had to deal with rustfmt yet. For clang-format, I work in existing projects and use (very) mildly tweaked variants of the base style for the project.
At the risk of instigating the canonical bikeshed discussion, I am a conformist formatter and have not concerned myself with modifying style all that much. But I agree that clang-format has some bizarre knobs to tweak.
-
@whitequark Saw your edit with the motivation for reading research. I doubt there's anything out there doing this well, but I think the smart approach to doing it well would be to evaluate and score a bunch of candidate standard-class rules across the codebase, solve for a set that maximally approximates what's already there, then apply some sort of pattern learning for the remaining instances that "break the rules", hopefully identifying correlations between them.
Basically, going as far as you can with simple comprehensible deterministic rules before you start throwing magical statistics at it.
@dalias i specifically do not want this because of two reasons:
- it requires software that doesn't exist (e.g. there are no Rust formatters that expose enough deterministic knobs for me)
- it doesn't resolve the rigidity of the underlying formatter
there is existing research doing the thing you're talking about here, which you could probably use as-is to achieve what you want (it even has an explainer tool for the rules it generates—note I haven't tried it, just read the abstract); I want the formatter to be somewhat liberal about the code it accepts. whether I think the code should be formatted a certain way (as a maintainer) is non-deterministic, so I see no real issue with the statistical model having chaotic-but-deterministic behavior in some cases as long as overall the behavior is reasonable
-
@whitequark @ireneista I have not had to deal with rustfmt yet. For clang-format, I work in existing projects and use (very) mildly tweaked variants of the base style for the project.
At the risk of instigating the canonical bikeshed discussion, I am a conformist formatter and have not concerned myself with modifying style all that much. But I agree that clang-format has some bizarre knobs to tweak.
@GeoffWozniak @ireneista I view code as art so I find strongly canonicalizing formatters like
blackto be actively destructive. right now I use Ruff with a 300-line configuration for some of the Python code and I think there's gotta be a better way to approach this that isn't destructive -
@dalias i specifically do not want this because of two reasons:
- it requires software that doesn't exist (e.g. there are no Rust formatters that expose enough deterministic knobs for me)
- it doesn't resolve the rigidity of the underlying formatter
there is existing research doing the thing you're talking about here, which you could probably use as-is to achieve what you want (it even has an explainer tool for the rules it generates—note I haven't tried it, just read the abstract); I want the formatter to be somewhat liberal about the code it accepts. whether I think the code should be formatted a certain way (as a maintainer) is non-deterministic, so I see no real issue with the statistical model having chaotic-but-deterministic behavior in some cases as long as overall the behavior is reasonable
@dalias the problem this is solving is that some contributors have an allergic reaction to getting "please format this in <X way>" review comments, so having a tool that gets a patch 95% to the way it 'ought' to be should lower friction in much the same way that adopting a strongly canonicalizing formatter like
blackwould, without downsides of the latter