i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

snowyfox@deadinsi.de

Figure in question seems to be about "model performing in its ideal conditions"

The author's actual opinion is implied in the Results:

"After inspecting the compilation checking module, we found that DUET CS achieves 55.8% computational accuracy, which is a practical metric for a code generation system. This result shows that more than half of the output code are compilable and implement the same function as the input code. The user can
use this check as an optional layer of the pipeline to guarantee grammar correctness.
...
We found that even the non-compilable outputs display around 60% similarity to the ground truth, which means even if DUET CS cannot always produce grammar-correct code, it can still provide valuable information to help user to transfer code style.
...
Notice, that generally the task of generating the exact same code as ground truth is very hard, especially when the code length is rather long (˜47 lines)."

snowyfox@deadinsi.de

That last one is a funny statement because it's laughably easy for a human to maintain the execution of a function after a style refactor. You would reprimand a junior if they couldn't do that

theeclecticdyslexic@mstdn.social

@whitequark @lu_leipzig that's a pretty reasonable concept I think.

I like the idea at least.

One thing I will say of deterministic formatters is they have changed my habits over time in order to get it to format the way I want. You can take that as both good and bad, but I think most (maybe 60%) of the things they have forced on me have been good.

Edit: I also get stun locked trying to decide how to format 15 lines of code far less often.

whitequark@social.treehouse.systems

@theeclecticdyslexic @lu_leipzig yeah if a formatter requires me to do things I don't want I simply quit using the formatter (and sometimes the codebase)

yvandasilva@hachyderm.io

@whitequark what the what.

gudenau@hachyderm.io

@whitequark Just like, use one of the tools that already exists? It'll be:
- Fast
- Cheap
- Efficient
- Accurate

I don't understand any of this "industry" outside of being a massive destructive boondoggle.

npars01@mstdn.social

@whitequark

Why is AI as a consumer product being rushed out the door so fast? It's obviously not ready for prime time. It's unreliable, inaccurate, and fragile.

It's like a car being sent out to car dealerships with only 3 wheels with hasty promises of a future 4th wheel.

Possibly the goal isn't a car with 4 wheels but a plan for something else, similar to a Waymo power outage gridlock

forbes.com

(www.forbes.com)

Reliance on AI is a national security risk vulnerable to high fuel prices

1/

budududuroiu@hachyderm.io

@whitequark "92 boosts, 115 favourites" damn

I swear to god sometimes Mastodon is just "old person yells at thing".

Researchers spend tons of time and money trying to solve Sudoku in polynomial time not because Sudoku is such a important problem to humanity, but because it's a NP-Hard problem, and you can thus reduce all other NP-complete problems to Sudoku and solve them all in polynomial time if you can solve Sudoku in polynomial time.

The research challenge is disentangling content from style in a learned embedding space, it's a classic representation learning problem that's genuinely hard: 1) Two functions that do the same thing should have identical content embeddings but different style embeddings, 2) Style must generalise to unseen code patterns, not just pattern-match known rules, 3) It's unsupervised, so there's no labeled (code_A, same_code_in_style_B) training pairs.

Code formatting is actually a very good medium to test this hypothesis, because you have an infinite latent space of code that does the exact same thing but is stylistically different.

dch@bsd.network

@whitequark feck that

nxskok@cupoftea.social

@whitequark @deborahh @danlyke ie, the sort of thing a linter does?

whitequark@social.treehouse.systems

@budududuroiu the reason I was reading the paper is because I'm working on the same problem and I think the encoding presented in the paper makes no sense at all to use

budududuroiu@hachyderm.io

@whitequark to use for what? It's research, it's not meant to create something for industry use. Academia already suffers from the "File-drawer problem". I also did research on using GANs for Outlier Detection, when most of the time Outlier Detection is a classification problem, not a learned representation problem.

whitequark@social.treehouse.systems

@budududuroiu yes yes i know you're here because you look at trends and start arguments, now move on to something else and stop wasting my time

budududuroiu@hachyderm.io

@whitequark lmao, have fun "clowning" on stuff you don't understand

whitequark@social.treehouse.systems

@budududuroiu go take a short walk off a long pier

rootkitty@yiff.life

@whitequark "If" right next to "if"

fibrojedi@gamepad.club

@whitequark if I have understood you correctly, they're saying 64% functional is a satisfactory result?

whitequark@social.treehouse.systems

@FibroJedi that's my read of it yeah

me@social.jlamothe.net

@whitequark Whenever I hear about these benchmarks I can't help but wonder how people can say these things with a straight face.

fibrojedi@gamepad.club

@whitequark Maybe they'd like their phone and car 64% functional as a real world test .

Some of those logic misses/switches are disturbing. I don't know how it's allowable.

If the code works 100%, and "reformatting" it reduces that % then it's wrong by definition.

CIRCLE WITH A DOT

i'm at a loss of words after reading a paper about reformatting code using an ML model that has a measured statistical quantity A_c which says how often the reformatted code behaves the same as the original

forbes.com