i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

lizardbill@hachyderm.io

@whitequark 64.2% of the time, it works every time!

geoffwozniak@masto.hackers.town

@whitequark @ireneista Sorry, I probably should have put a CW on that.

whitequark@social.treehouse.systems

@static one of my motivations for this is that there are linters popular in the Python ecosystem and i really don't like how they work, haha

csolisr@hub.azkware.net

@whitequark Seeing somebody trying to implement the service proposed at malus.sh/ and it working just half of the time makes me keep some hope.

whitequark@social.treehouse.systems

@csolisr i did a double take

aburka@hachyderm.io

@whitequark I guess if your code is extruded as a homogenous paste and probably didn't work to begin with, one doesn't care as much...?

kouhai@social.treehouse.systems

@whitequark @ireneista @GeoffWozniak ~~ah, so python indentation~~

hennichodernich@radiosocial.de

@nxskok @whitequark @deborahh @danlyke to be fair, according to the paper, replacing for with while loops and vice versa and the like was also the goal

mrkeen@mastodon.social

@deborahh @whitequark @danlyke

No.

"there is no existing work that performs full stylization on an arbitrary piece of code. The most common methods are rule-based linters, formatters, which are limited to a few pre-defined style rules"

whitequark@social.treehouse.systems

@mrkeen @deborahh @danlyke I do think that stretching the definition of what "code style" could reasonably refer to until it fits the shape of the research product is a part of the problem here. (Consider that the introduction explicitly refers to the gotofail bug as something the research is supposed to help with, whereas it is plainly evident that it would make that problem only worse.)

burningtyger@nrw.social

@whitequark I'm slightly embarrassed that this is coming from Germany.

urixturing@hachyderm.io

@disorderlyf @whitequark IEEE and ACM don't do the research nor they think you to do things, they are publishers that own journals and conferences where researchers publish their work

burningtyger@nrw.social

@whitequark @theeclecticdyslexic @lu_leipzig You are absolutely right. So for JS/TS we're using eslint only. It is much less strict about things but gets the job done. Line length is one of my pet peeves. I simply cannot and don't want a strict length because sometimes a line is longer than the rest. For reasons. I don't use formatters either for that reason. Works well for me.

whitequark@social.treehouse.systems

@urixturing @disorderlyf yeah. there are other issues with their models but this isn't one

mc@mathstodon.xyz

@whitequark well the paper speaks of *code style* which is more than just formatting but also, shouldn't we welcome negative results in science?

whitequark@social.treehouse.systems

@mc I feel like if the negative result is obvious given the hypothesis it has a lot less value

sop@unstable.systems

@whitequark I do think that asking for 100.0% equivalency is something that's both necessary to ask of something you'd want to put in a CI _and_ unreasonable to ask of something that tries to solve this problem

having accidentally gone through this specific kind of exercise a few times in the last couple weeks — turning java code into kotlin code intellij would spit into kotlin code I'd be happy to put my name on — I usually reach maybe 98% compatibility, then settle for that because I identify the remaining 2% of behaviours as "hard to replicate in the new shape of the code," "minor enough not to matter" and "not desirable, actually"

once you're happy to aim somewhere south than 100.0% I guess it's interesting to figure out how close you can get — and then yeah this approach only gets you to 64% which is only good as a milestone for future efforts to compare against ‍️

maybe all this ends up being good for is dropping comments on PRs (and, if you recognize me, we both know how we feel about that)

whitequark@social.treehouse.systems

@sop but I'm not doing language translation, input and output are in the same language and should have essentially identical (machine-checkably equivalent) ASTs

sop@unstable.systems

@whitequark (reads https://social.treehouse.systems/@whitequark/116283070331505039) oh no did I just explain something you thought obvious back to you

whitequark@social.treehouse.systems

@sop i guess? basically, you can set up a system around an ML model in two ways: where the model gets to alter things that are not (lexer) whitespace, and where the model gets to alter random (lexer) tokens

the paper goes for #2
i am collabrating on a project that does #1, which gives 100.0% (with the caveat above) by design—because a formatting tool that sometimes breaks code is a net negative

CIRCLE WITH A DOT

i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original