I realise on the fediverse this is maybe asking for a flaming, but yesterday out of sheer curiosity I tried Claude for a simpleish coding task that I'd been putting off (largely inspired by @hausfath 's latest on #theclimatebrink).

arnebab@rollenspiel.social

@benjamingeer scientific code is usually a mass of spaghetti.

I once made a data cleanup program of a colleague at least 100x faster by just processing the data in one go instead of opening it again and seeking to the last position for each single line.

You need to know where you come from to check whether something brings benefits.

That said: if that had been a 10k lines AI code monster, I couldn’t have fixed it in the 30 minutes I had.

@Ruth_Mottram @hausfath @UlrikeHahn

arnebab@rollenspiel.social

@benjamingeer But, just to make it clear: that code which was 100x slower than it could have been, was still correct.

It was slow, but it did very complex tasks correctly.
@Ruth_Mottram @hausfath @UlrikeHahn

arnebab@rollenspiel.social

@Ruth_Mottram though my main gripe with us as human society is that we’re spending more than 400 billion dollars a year to build error-prone general pattern recognition and reproduction while finding maybe 100 problems where it brings big benefits -- that would each require less than 10 million dollars to solve.

Why don’t we have solutions for those tasks already?

Why is matplotlib mostly written by some folks in their spare time while it has tons of value?
@benjamingeer @hausfath @UlrikeHahn

benjamingeer@piaille.fr

@UlrikeHahn What is the "good" that you want your students to produce? The thing that has real value? Is it essays or learning? Perhaps students are using LLMs to write essays because they mistakenly believe that the essay is an end in itself, rather than a means to an end. As somebody said, sometimes it makes sense to have someone cook your meal for you, but it never makes sense to have someone eat your meal for you. @Ruth_Mottram @hausfath

karolina@fediscience.org

Do people actually read the code Claude runs and how it differs from what Claude gives as an output?

ulrikehahn@fediscience.org

@benjamingeer @Ruth_Mottram @hausfath Benjamin, maybe just reread the previous post of yours and ask yourself “what in this post am I saying that could possibly be new to the person I am addressing?”…and then see where that leads you

benjamingeer@piaille.fr

@UlrikeHahn It would surprise me if anything I said was new to you. What surprised me was that you described the production of counterfeit goods as productivity. @Ruth_Mottram @hausfath

ulrikehahn@fediscience.org

@benjamingeer @Ruth_Mottram @hausfath maybe that should be a clue that you are somehow missing the intended point?

benjamingeer@piaille.fr

@UlrikeHahn The original question was whether LLM coding assistants would make scientists more productive. It sounded like you were arguing that they would, since LLMs are not just hype, as evidenced by their efficiency in producing fake course work, etc. Were you being ironic? @Ruth_Mottram @hausfath

arnebab@rollenspiel.social

@Ruth_Mottram when you use AI to transform your content from one form to another, parts of the content usually associated with the target form creep into your content.

This can be as bad as turning "agriculture that needs less antibiotics, because animals stay healthier" into "agriculture without antibiotics" (so sick animals suffer needlessly).

Because AI does not differentiate between content and form.
@benjamingeer @hausfath @UlrikeHahn

ulrikehahn@fediscience.org

@benjamingeer @Ruth_Mottram @hausfath I will leave that to you to puzzle out and now stop bombarding Ruth’s thread….

osma@mas.to

On pure software side: 10 years ago playing with the first gen Raspberry Pi camera, I realized its relatively exotic video interface could be leveraged to do motion detection with extremely low CPU usage.

Those interfaces have since changed and the same approach no longer works. So a few months ago I decided to try an experiment: could OpenCode make a new version, compatible with the latest hardware and interfaces? 1/2
@Ruth_Mottram @hausfath

osma@mas.to

The planning stage worked like magic. It generated a plan which detailed why the old code doesn't work, listed all new new solutions, and outlined a plan of conversion.

It all fell apart moving to implementation though. Spinning in circles it ended up producing a completely unworkable resemblance of code that didn't even have hope of working.

What looked excitingly plausible for a forward port turned out a dead end. 2/2
@Ruth_Mottram @hausfath

osma@mas.to

Since I didn't spend the time to try and implement the plan by hand, I don't know if it was feasible, just that it did look plausible at first.

And that I think is the major issue with all LLMs. The artifacts look plausible, entirely regardless of whether they're factually correct. 3/2
@Ruth_Mottram @hausfath

tkissing@mastodon.social

@Ruth_Mottram@fediscience.org @hausfath I played roulette once, putting $5 on a number and won. I didn't suggest that everyone I know should quit their jobs and just bet on that number for a living.

LLMs are autocomplete on cocaine. Yes, sometimes they'll spit out something useful, but often times they don't and the more we use them, the more we lose the ability to tell the good from the bad.

The best Europe can do is to invest in people.

arnebab@rollenspiel.social

@benjamingeer Therefore I’d rather compare LLMs to using statistical methods without understanding them.

That’s already widespread and I expect that with LLMs it will get worse.
@Ruth_Mottram @hausfath @UlrikeHahn

1337@techhub.social

@Ruth_Mottram @hausfath This seems like a *really* bad idea. I'm a software engineer and not a scientist, but I believe I've heard there's already a fairly big problem in the sciences with software bugs producing misleading results. I imagine using AI to write code could make this much worse. IMO, the extra time that would've been spent coding everything would not have been wasted. Coding it yourself gives you more time to think about what you're typing and gain a more complete understanding of your code and the libraries you're using; giving you more time and insight to spot bugs or otherwise wrong or less than optimal ways of doing things. If one did a thorough review of the AI generated code to ensure it was correct, I'd guess it take at least the same amount of time. Furthermore, seeing the AI generated code first would create "anchoring bias," possibly still resulting in code with more bugs.

arnebab@rollenspiel.social

@Padjo the core question is: for which tasks does this work reliably?

Did you review the code to ensure that it doesn’t have unintended side-effects?

(that’s the difference between having an auto-complete that works on abstract concepts and negligently releasing potentially dangerous products to the public)

⇒ the fast part is only for the prototyping stage.
@Ruth_Mottram @hausfath

arnebab@rollenspiel.social

@1337 "anchoring bias" is a formulation I searched for.

Thank you!

That anchoring bias is why Larian finally decided not to let their concept artists use AI generated props for inspiration.
@Ruth_Mottram @hausfath

yvandasilva@hachyderm.io

@Ruth_Mottram @hausfath it's okay for one shot little scripts.
Which most data science is.

For long term projects that you need to maintain that grow to thousands or millions of lines that need to live long term and be maintained it's not ok.
It adds too much tech debt too quickly.

Writing code was never the problem tbh. Again for scripts and small few pagers, it's as good as any template generator or dumny drag and drop tool.

CIRCLE WITH A DOT

I realise on the fediverse this is maybe asking for a flaming, but yesterday out of sheer curiosity I tried Claude for a simpleish coding task that I'd been putting off (largely inspired by @hausfath 's latest on #theclimatebrink).

The AI-Augmented Scientist

The AI-Augmented Scientist

The AI-Augmented Scientist

The AI-Augmented Scientist

The AI-Augmented Scientist