Opinion poll:

veronica@explains.social

Opinion poll:

In your opinion, is speech-to-text "generative AI?"

veronica@explains.social

I have never considered speech-to-text as "generative." I've always thought of it as transitioning information between contexts (aural to written).

In other words, if I speak words, and then manually wrote them down, I generated them at the "speaking" part, not at the "writing them down" part.

I've had folks disagree as of late, though, saying that speech-to-text is "generative" in that it literally generates content where content did not exist.

This is a nuance I hadn't considered.

jessebot@social.smallhack.org

@veronica I think this gets muddy given that people often use LLMs to do this these days (whereas previously they may not have).

Side note: Steno is such a neato art form. I wish I had the time to get into it.

veronica@explains.social

The reason I'm thinking about this distinction is captions.

I am a firm believer in captions, and I've gone to great lengths to manually write out captions for every video.

Speech-to-text has helped immensely with this task over the years. If I've done non-scripted content, using a speech-to-text program has saved me countless hours of retyping.

But much more importantly, I know the technology has helped folks who are deaf or hard of hearing. Live captioning using speech-to-text is a great tool, particularly when it's locally hosted.

I haven't considered that to some folks, that usage would count as "generative AI".

veronica@explains.social

Now, if the speech-to-text adds context, like changing the following text:

"I used grep to find these files"

into this:

"Veronica said she used grep to find these files"

I think that part is generative, if that makes sense? Am I making sense?

veronica@explains.social

I'm slowly devolving back into my earliest form, "imagined scenarios for academic purposes"

veronica@explains.social

@jessebot I'm with you, but a stenographer isn't going to follow a person who is deaf or hard of hearing around to do live captioning, and I think that's where the nuance really deepens for me.

aeris@firefish.imirhil.fr

@veronica@explains.social "generative" for AI is not about the fact it generate content, but about the way it generate the content. Procedural generation vs generative generation for example.

janwilejan@snug.moe

@veronica iirc Piper tts will translate silence into "thank you" as a common hallucination.

so yes, newer speech-to-text models are generative AI.

jessebot@social.smallhack.org

@veronica And that's fair. I just feel like in as many instances possible, there should be interpreters. In the Netherlands, you are allowed to request an interpreter in many situations, for instance at the hospital. It's also common to have official interpreters at cons that can be requested (This was the case at the most recent Linux Foundation run con I went to - Kubecon Europe - hosted in Amsterdam). I'd like to see more situations where we employ humans to do this work, rather than try to solve more of this with tech, especially where people will lean towards AI, because I don't know that a lot of people with disabilities would feel comfortable using tech that has so many horrible moral consequences behind it.

(To be clear: I'm not saying you're advocating for AI or anything, and sorry if it came off that way. Also, not saying you personally have a moral failing or anything. )

ciphermonger@infosec.exchange

@veronica Don't worry, it will instead change it to say "I use grape to find these files", leaving everyone wondering how fruit solves anything.

sortius@infosec.exchange

@veronica speech-to-text has existed longer than LLMs, so I'm not sure why "AI" even comes into it.

There are definitely arguments against using LLMs to do the processing, as the systems that existed before worked relatively well, and didn't require fascism or killing the planet

CIRCLE WITH A DOT

Opinion poll: