I realise on the fediverse this is maybe asking for a flaming, but yesterday out of sheer curiosity I tried Claude for a simpleish coding task that I'd been putting off (largely inspired by @hausfath 's latest on #theclimatebrink).

osma@mas.to

The planning stage worked like magic. It generated a plan which detailed why the old code doesn't work, listed all new new solutions, and outlined a plan of conversion.

It all fell apart moving to implementation though. Spinning in circles it ended up producing a completely unworkable resemblance of code that didn't even have hope of working.

What looked excitingly plausible for a forward port turned out a dead end. 2/2
@Ruth_Mottram @hausfath

osma@mas.to

Since I didn't spend the time to try and implement the plan by hand, I don't know if it was feasible, just that it did look plausible at first.

And that I think is the major issue with all LLMs. The artifacts look plausible, entirely regardless of whether they're factually correct. 3/2
@Ruth_Mottram @hausfath

tkissing@mastodon.social

@Ruth_Mottram@fediscience.org @hausfath I played roulette once, putting $5 on a number and won. I didn't suggest that everyone I know should quit their jobs and just bet on that number for a living.

LLMs are autocomplete on cocaine. Yes, sometimes they'll spit out something useful, but often times they don't and the more we use them, the more we lose the ability to tell the good from the bad.

The best Europe can do is to invest in people.

arnebab@rollenspiel.social

@benjamingeer Therefore I’d rather compare LLMs to using statistical methods without understanding them.

That’s already widespread and I expect that with LLMs it will get worse.
@Ruth_Mottram @hausfath @UlrikeHahn

1337@techhub.social

@Ruth_Mottram @hausfath This seems like a *really* bad idea. I'm a software engineer and not a scientist, but I believe I've heard there's already a fairly big problem in the sciences with software bugs producing misleading results. I imagine using AI to write code could make this much worse. IMO, the extra time that would've been spent coding everything would not have been wasted. Coding it yourself gives you more time to think about what you're typing and gain a more complete understanding of your code and the libraries you're using; giving you more time and insight to spot bugs or otherwise wrong or less than optimal ways of doing things. If one did a thorough review of the AI generated code to ensure it was correct, I'd guess it take at least the same amount of time. Furthermore, seeing the AI generated code first would create "anchoring bias," possibly still resulting in code with more bugs.

arnebab@rollenspiel.social

@Padjo the core question is: for which tasks does this work reliably?

Did you review the code to ensure that it doesn’t have unintended side-effects?

(that’s the difference between having an auto-complete that works on abstract concepts and negligently releasing potentially dangerous products to the public)

⇒ the fast part is only for the prototyping stage.
@Ruth_Mottram @hausfath

arnebab@rollenspiel.social

@1337 "anchoring bias" is a formulation I searched for.

Thank you!

That anchoring bias is why Larian finally decided not to let their concept artists use AI generated props for inspiration.
@Ruth_Mottram @hausfath

yvandasilva@hachyderm.io

@Ruth_Mottram @hausfath it's okay for one shot little scripts.
Which most data science is.

For long term projects that you need to maintain that grow to thousands or millions of lines that need to live long term and be maintained it's not ok.
It adds too much tech debt too quickly.

Writing code was never the problem tbh. Again for scripts and small few pagers, it's as good as any template generator or dumny drag and drop tool.

yvandasilva@hachyderm.io

@pettter @Ruth_Mottram @hausfath
This is correct, the use of agents which is what allows to have sensible scripts that do what they are supposed to do rather than eyeballing it. Will generate hundreds if not thousands of queries for a very simple input.
Since generally there will be more than one its not unexpected to product multiple thousands of queries via an agent to an LLMs. Its own "thinking mode" and tool triggering will also triggers more queries.
All of that not even going into the "multi-agent" /"swarm of agents" territory.

slotos@toot.community

@ArneBab You skipped the most important point:

- not intending for the result to be maintained

For a one-off result these models seem impressive. Hell, outside of a „solve wages” bubble AI field consistently produces useful tools.

But holy shit, can people that have never had to maintain a system after a 10x fuckface has fled the scene shut the fuck up about AI and coding? Code is the easy part where engineers get to finish the productivity reward loop. Go automate your vacations instead!

arnebab@rollenspiel.social

@Ruth_Mottram I pondered for the past hour why this annoys me so much (because it does, even though I do see the individual arguments).

We spent more than a decade enabling scientists to cut lose from their matlab and office subscriptions that made scientific work dependent on regular payments, to enable them to do their work with matplotlib instead, and now many jump right back into a subscription service -- that uses matplotlib to make them dependent.

That adds insult to injury.
@hausfath

garonenur@rollenspiel.social

@ArneBab @Ruth_Mottram @hausfath yes this!

The models will never be open or free, like open Software and AI will be a huge factor to increase climate change even!
But the dependence and subscription service part should really be the deal breaker here.
Also: trust in the AI should be very low if it is provided by billionaire owned companies.

garonenur@rollenspiel.social

@UlrikeHahn @benjamingeer @Ruth_Mottram @hausfath
This is such a depressing, but true, point I did not consider.
But now wonder if the heads of AI did, and like this productivity even.

padjo@mastodon.ie

@ArneBab @Ruth_Mottram @hausfath yes I reviewed the code. I worked with it to define the architecture and choose technologies. They are technologies I'm familiar with. The code is as good or better than I would write. It was far more thorough with edge cases. It handled error states better than i would have. I'm using it to build a new project, maybe it will reach a point where it is no longer helpful but I haven't seen any evidence of that. Software is just dramatically cheaper to produce now.

arnebab@rollenspiel.social

@Padjo that’s interesting.

Thanks for the info.
@Ruth_Mottram @hausfath

hopeless@mas.to

@Ruth_Mottram @hausfath

Yes I think if you try the current SOTA stuff (like Google's Antigravity) on your own choice of code, own tasks, editing checkouts on your own machine, it's hard not to be impressed.

One thing to keep in mind is current LLMs are prone to sins of omission. I would strongly suggest never one-shotting anything and committing it.

If you simply ask it to audit what it just did from a security and completeness perspective, and fix what it found, you can get a big step up.

CIRCLE WITH A DOT

I realise on the fediverse this is maybe asking for a flaming, but yesterday out of sheer curiosity I tried Claude for a simpleish coding task that I'd been putting off (largely inspired by @hausfath 's latest on #theclimatebrink).

The AI-Augmented Scientist

The AI-Augmented Scientist

The AI-Augmented Scientist

The AI-Augmented Scientist

The AI-Augmented Scientist