so the take-away from this is that all of this agentic stuff backend is just begging the LLM to please, please not fuck up?
-
RE: https://neuromatch.social/@jonny/116324676116121930
so the take-away from this is that all of this agentic stuff backend is just begging the LLM to please, please not fuck up? am i getting this right?
@eniko we had an internal workshop about this (a lot of techbros at work...) and yes, that is basically it, begging and praying (and spending enough money to hire more people)
-
RE: https://neuromatch.social/@jonny/116324676116121930
so the take-away from this is that all of this agentic stuff backend is just begging the LLM to please, please not fuck up? am i getting this right?
this isn't engineering this is a religious cult
-
RE: https://neuromatch.social/@jonny/116324676116121930
so the take-away from this is that all of this agentic stuff backend is just begging the LLM to please, please not fuck up? am i getting this right?
@eniko Don't forget the miles of squirrelly regex that compensate for its "almost human" language skills!
But yeah. They have no control over this thing and they've just been hawking a really elaborate Wizard Of Oz that sometimes breaks GitHub.
-
RE: https://neuromatch.social/@jonny/116324676116121930
so the take-away from this is that all of this agentic stuff backend is just begging the LLM to please, please not fuck up? am i getting this right?
@eniko yeah, this is how all AI "engineering" works under the hood. every time the model does something stupid, they write an extra bit of system prompt to burn a few tokens explaining not to do that thing. same exact energy as shadiversity writing "correct anatomy, perfect lighting, masterpiece x1000000" at the end of every prompt as if the model knows how to do those things but simply chose not to
-
this isn't engineering this is a religious cult
@eniko@mastodon.gamedev.place there's a reason why Most Engineers don't recognize Software Engineering as an actual engineering discipline, and the current crop isn't doing anything to convince them that should change.
-
this isn't engineering this is a religious cult
@eniko gotta preach the TESCREAL gospel or the Basilisk will get ya
-
@eniko@mastodon.gamedev.place there's a reason why Most Engineers don't recognize Software Engineering as an actual engineering discipline, and the current crop isn't doing anything to convince them that should change.
-
RE: https://neuromatch.social/@jonny/116324676116121930
so the take-away from this is that all of this agentic stuff backend is just begging the LLM to please, please not fuck up? am i getting this right?
@eniko A vast mass of bad code held together by duct tape (and prayers).
-
@eniko@mastodon.gamedev.place you're absolutely right! yup
@alice@mk.nyaa.place @eniko@mastodon.gamedev.place because right now brain goes brap, this post just made me realize that "you're absolutely right" can be shortened to YAR.
Pirates are onto something good. -
R relay@relay.publicsquare.global shared this topic
-
RE: https://neuromatch.social/@jonny/116324676116121930
so the take-away from this is that all of this agentic stuff backend is just begging the LLM to please, please not fuck up? am i getting this right?
@eniko YES!!!
I have been yelling about this in different places for 4 years. Before they really took off I used to do hobby red-teaming around GPT-3 and 4, and there was this moment of realizing that none of the security for this stuff fucking works, at all, and yet no one wanted to admit that, it was just layers of programmers going "what if there's a second LLM though", and stuffing their heads in the sand, and NOTHING HAS CHANGED.
-
@eniko YES!!!
I have been yelling about this in different places for 4 years. Before they really took off I used to do hobby red-teaming around GPT-3 and 4, and there was this moment of realizing that none of the security for this stuff fucking works, at all, and yet no one wanted to admit that, it was just layers of programmers going "what if there's a second LLM though", and stuffing their heads in the sand, and NOTHING HAS CHANGED.
@eniko the only difference now is that the media has bought into it and now we have "agentic" browsers and all this shit and literally all of it is fundamentally impossible to secure - and the entire tech space (not entire but like, you know) has gone, "but what if we pretend it's fine tho".
4 years ago I was arguing with programmers about this and they were like "well by the time it gets access to your emails, this will be fixed."
It was not fixed.
-
@eniko the only difference now is that the media has bought into it and now we have "agentic" browsers and all this shit and literally all of it is fundamentally impossible to secure - and the entire tech space (not entire but like, you know) has gone, "but what if we pretend it's fine tho".
4 years ago I was arguing with programmers about this and they were like "well by the time it gets access to your emails, this will be fixed."
It was not fixed.
@eniko we have replaced solid software security with the equivalent of the Google SEO wars. We have expanded phishing attacks so that now they work on your computer itself, not just on you.
We are so unbelievably fucked, none of this can be used in a sensitive environment.
And the overwhelming consensus from researchers is that this is impossible to solve, all you can do is beg for the computer not to fuck everything up in increasingly desperate ways.
-
@eniko we have replaced solid software security with the equivalent of the Google SEO wars. We have expanded phishing attacks so that now they work on your computer itself, not just on you.
We are so unbelievably fucked, none of this can be used in a sensitive environment.
And the overwhelming consensus from researchers is that this is impossible to solve, all you can do is beg for the computer not to fuck everything up in increasingly desperate ways.
@eniko (this is also why "a chat prompt" is the wrong way to measure energy costs for these things, because the only way to get them to do anything halfway useful is to burn tokens like a forest fire. Not sure about the output? Run an entirely separate LLM to check! Run it 3 times and average the results! Run it in a loop until the test passes! Oops, the format was wrong, run it again and see what happens.
-
this isn't engineering this is a religious cult
-
RE: https://neuromatch.social/@jonny/116324676116121930
so the take-away from this is that all of this agentic stuff backend is just begging the LLM to please, please not fuck up? am i getting this right?
@eniko
Also, "please don't do crimes". -
@eniko (this is also why "a chat prompt" is the wrong way to measure energy costs for these things, because the only way to get them to do anything halfway useful is to burn tokens like a forest fire. Not sure about the output? Run an entirely separate LLM to check! Run it 3 times and average the results! Run it in a loop until the test passes! Oops, the format was wrong, run it again and see what happens.
@eniko and I actually think that this is why "agents" are when this slop took off for programmers. Because it lets that happen in the background where you don't have to see your shame.
But like.. this is also why even assuming they're true, the "single query is like running your microwave for a second" shit is so disingenuous. No one is doing a single query.
They are leaving the microwave running 24x7, which turns out is actually quite bad for the environment!
-
@eniko and I actually think that this is why "agents" are when this slop took off for programmers. Because it lets that happen in the background where you don't have to see your shame.
But like.. this is also why even assuming they're true, the "single query is like running your microwave for a second" shit is so disingenuous. No one is doing a single query.
They are leaving the microwave running 24x7, which turns out is actually quite bad for the environment!
@eniko measuring LLM energy usage per query is like measuring gas efficiency "per rotation of your car's wheel"
-
R relay@relay.infosec.exchange shared this topic
-
this isn't engineering this is a religious cult
@eniko maybe this is a hot take but I think having employees can be a lot like this which is why it doesn’t raise red flags for managers
-
R relay@relay.mycrowd.ca shared this topic