I heard a reporter from Axios interviewed on NPR the other day (Marketplace Tech, I think) talking about how the tech companies are putting out new models every 6 months to 1 year and how each model is more "powerful" than the previous.

emilymbender@dair-community.social

This got me thinking about what it means when we describe technology and this technology in particular as "powerful".

🧵>>

emilymbender@dair-community.social

Motors and engines can be more or less powerful in the very literal sense that they can generate more or less physical power.

Language can be powerful in that it gives us the power to move people.

Spreadsheet software (or calculators, for that matter) is powerful because it provides useful functions that help us to keep track of information, do calculations, etc in a way that gives us a better vantage point over that info and/or saves time.

>>

emilymbender@dair-community.social

I think that when large language models (and systems built around them) are advertised as "powerful" there's a strategic and insidious ambiguity at play.

>>

emilymbender@dair-community.social

These models are undoubtedly powerful in that they are used to justify the amassing of data and wealth, both of which confer power on those who control them. They are also powerful in requiring immense amounts of electrical power to produce.

They are advertised as having new and impressive functionalities (misleading called "capabilities") each time, which also seems like powerful in the spreadsheet/calculator sense ... or would be if they actually worked as advertised.

>>

emilymbender@dair-community.social

And then finally there is the sense of powerful like an engine that generates power --- this is, I think, the deepest fantasy: you, the user (or better yet owner), have access to the "raw power" of the model.

So I urge journalists and others who are tempted to describe models as "powerful" (or transcribe the PR copy of the companies that call them "powerful") to reflect on what you think that means, and what evidence you have that it is true.

/fin

joe@beige.party

@emilymbender not to argue - but Google I think is showing in some ways power in how some of its proof of concept tools do planning and action sequences. Generating text is the first iteration, then the chat response model/training, and the combination of tool use and planning seems to be a leap forward. Claude code, open claw and others are versions of AutoGPT - just kind of ripping off agile workflows for execution. I would say when the model can “consider” what to do, then execute using external tools - that’s a version of power. It’s the aspect that should excite and worry us. It does it with confidence, and not poorly in some limited cases.

These kinds of tools tend to amaze me. They are a step beyond. I think as they train on more tokens they are just seeing a side effect of things like this working better, being able to read documents and file tax forms. Agreed to be very skeptical because it’s marketing hype and Claude has clogged the internet with paid astroturfing. It’s disgusting.

Stitch - Design with AI

Stitch generates UIs for mobile and web applications, making design ideation fast and easy.

Stitch (stitch.withgoogle.com)

mathaetaes@infosec.exchange

@emilymbender You're far more an expert on this than me, so I will defer to your experience.

But in all these examples, "powerful" describes utility. A more powerful engine can do more things than a weaker one. A more advanced spreadsheet software can help users calculate/track more things than a less advanced one.

By that definition, wouldn't "powerful" for LLMs just mean less wrong, or wrong less often?

At the end of the day, a model is just predicting text. A model can't use a tool, but it can predict the text required to use a tool. A model can't write code, but it can predict the text that a compiler will turn into a program. We keep building integrations that allow tools to be driven by text, which allows text prediction models to 'use' them... but really it's still just predicting text.

The only real metrics that apply to an LLM are size, speed, and accuracy. For frontier models, size and speed is always compensated by throwing more hardware at it, so users never see it. Thus, the only reasonable measure of power, for the journalistic contexts you're talking about, is the accuracy of the text it's predicting.

Thus, a "more powerful AI model" is just one that is less wrong than the previous generations. No?

That said, I do agree with your points that journalists are doing the PR firms' jobs for them when they use "more powerful" as a stand-in for "less wrong."

joe@beige.party

@emilymbender sorry! I agree, there can be limited use of the word power - but being able to generate fingers better isn’t it. You need leaps.

emilymbender@dair-community.social

@joe "Not to argue but" ... argue argue argue google PR argue argue

joshg@mathstodon.xyz

@emilymbender yeah. I really suspect it's mostly the consumption.= aspect. it *must* be more powerful because it consumes so much power, like revving your loud gas-guzzling V8 hemi to show off

fuzzy@beige.party

@emilymbender which one word would you use?

emilymbender@dair-community.social

@fuzzy Why do you think there should be one word? Specificity is what is required here.

fuzzy@beige.party

@emilymbender I don't believe that there should be a single word, but which one word would you use (instead of "powerful")?

bltpizza@mastodon.social

@emilymbender Marketplace Tech segments have been enthusiastically promoting AI for over two years. I don't recall anything approaching a tough question for any guests or unbiased journalism.

emilymbender@dair-community.social

@fuzzy Which property of the models are you trying to label? .My thread was about how that word is being used with insidious ambiguity.

fartnuggets@jorts.horse

@joe @emilymbender bro, you need to slow tf down, and learn a bit more about whom you're correcting. Tuck your ego back in your pants, and don't try to "not to argue" argue your talking points.

Knock it off.

joe@beige.party

@fartnuggets @emilymbender lol I knew it, you can’t discuss anything on social media. Gender is more important. Thanks for putting me in my place.

joe@beige.party

@fartnuggets @emilymbender literally didn’t “correct” anyone. This is peak social media. Love when men attack men for existing in spaces.

joe@beige.party

@emilymbender you’re right, being objective is silly.

joe@beige.party

@emilymbender you used the word evidence… it’s wild. Sorry I forgot I was wrong anyways. I can’t possibly be right.

CIRCLE WITH A DOT

I heard a reporter from Axios interviewed on NPR the other day (Marketplace Tech, I think) talking about how the tech companies are putting out new models every 6 months to 1 year and how each model is more "powerful" than the previous.

Stitch - Design with AI

Stitch - Design with AI

Stitch - Design with AI

Stitch - Design with AI