i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

whitequark@social.treehouse.systems

@urixturing @disorderlyf yeah. there are other issues with their models but this isn't one

mc@mathstodon.xyz

@whitequark well the paper speaks of *code style* which is more than just formatting but also, shouldn't we welcome negative results in science?

whitequark@social.treehouse.systems

@mc I feel like if the negative result is obvious given the hypothesis it has a lot less value

sop@unstable.systems

@whitequark I do think that asking for 100.0% equivalency is something that's both necessary to ask of something you'd want to put in a CI _and_ unreasonable to ask of something that tries to solve this problem

having accidentally gone through this specific kind of exercise a few times in the last couple weeks — turning java code into kotlin code intellij would spit into kotlin code I'd be happy to put my name on — I usually reach maybe 98% compatibility, then settle for that because I identify the remaining 2% of behaviours as "hard to replicate in the new shape of the code," "minor enough not to matter" and "not desirable, actually"

once you're happy to aim somewhere south than 100.0% I guess it's interesting to figure out how close you can get — and then yeah this approach only gets you to 64% which is only good as a milestone for future efforts to compare against ‍️

maybe all this ends up being good for is dropping comments on PRs (and, if you recognize me, we both know how we feel about that)

whitequark@social.treehouse.systems

@sop but I'm not doing language translation, input and output are in the same language and should have essentially identical (machine-checkably equivalent) ASTs

sop@unstable.systems

@whitequark (reads https://social.treehouse.systems/@whitequark/116283070331505039) oh no did I just explain something you thought obvious back to you

whitequark@social.treehouse.systems

@sop i guess? basically, you can set up a system around an ML model in two ways: where the model gets to alter things that are not (lexer) whitespace, and where the model gets to alter random (lexer) tokens

the paper goes for #2
i am collabrating on a project that does #1, which gives 100.0% (with the caveat above) by design—because a formatting tool that sometimes breaks code is a net negative

srazkvt@tech.lgbt

@lu_leipzig @whitequark i would honestly be more interested into a deterministic but very configurable formatter, and a ml model to, from sample code, write a config for you, and you just do minor adjustments to it, generally all code styles stand in just a few hundred switches

whitequark@social.treehouse.systems

@SRAZKVT @lu_leipzig this would be ~easy to do but convincing people to implement and maintain "a few hundred switches" has been incredibly difficult; my motivation is exactly that rustfmt maintainers have been consistently unwilling to entertain that

whitequark@social.treehouse.systems

@SRAZKVT @lu_leipzig if every language i cared about (at this point: mainly rust, python, and c++) had highly configurable formatters i would not care to spend as much effort as i'm planning to on ml research

srazkvt@tech.lgbt

@whitequark @lu_leipzig most tooling devs today seem to believe in a one size fits all with no configurability, kind of sad

also i think the problem of "but if every codebase isn't formatted exactly the same" is way overblown, once you start reading the code it really doesn't take long to adapt to a new style, barely a few minutes from my experience

whitequark@social.treehouse.systems

@SRAZKVT @lu_leipzig there is a more real problem of "some people bounce off contributing if you ask them to fix style"

ingalovinde@embracing.space

@sabik @xgranade @whitequark @porglezomp but they also changed the boundaries! "Input" checks all values from 2 to i+2 inclusive; but "ground truth" just trows i+2 iteration out.

illybytes@shrimp.imsofucking.gay

@hennichodernich @danlyke @whitequark @deborahh @nxskok but like wouldn't that be easy to implement?
like

for(expression;bool expression; affectation) that would turn into
expression; while (bool) { //every possible branch inside while would get affectation }

sabik@rants.au

@IngaLovinde @xgranade @whitequark @porglezomp
`i` starts from 1 in the "ground truth" version

ingalovinde@embracing.space

@sabik @xgranade @whitequark @porglezomp ah I see, so the new i is just the old one + 1

benjamineskola@hachyderm.io

@mc @whitequark do they actually even recognise it as a negative result though?

They seem to be presenting it as a positive one (looking at the abstract and conclusion) — but I admit I'm not familiar with the norms for writing this sort of paper.

dasgrueneblatt@wien.rocks

@whitequark amazing

teilweise@layer8.space

@whitequark Looking at https://upload.whitequark.org/1774306843-Duetcs_Code_Style_Transfer_through_Generation_and_Retrieval.pdf, Fig. 6:

Look at `bool ok, count = false;`: This leaves “ok” at an undefined value.
In any case that should print “YES”, the `ok = false;` line is never called, it’s undefined whether it prints “YES” or ”NO” (might even be different for each invocation).

Neither the input nor the ground truth had that bug.

It looks like the researches did not notice it and considered it correct.
(64.2% …)

It was obvious to me, would you have caught it?

srazkvt@tech.lgbt

@whitequark @lu_leipzig yea, such as: the code being shit

CIRCLE WITH A DOT

i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original