There will never be an AI tool that is truly private unless it hasn't trained on nonconsensual data.

gbargoud@masto.nyc

RE: https://masto.nyc/@gbargoud/115822346288522227

@Em0nM4stodon

I hope they take this issue up again in the next session where they might vote on it

awalter@mastodon.bawue.social

@Em0nM4stodon What if you run a Large Language Modell local on your device?

em0nm4stodon@infosec.exchange

@awalter Where does the data to train the LLM initially come from?

watchfulcitizen@goingdark.social

@Em0nM4stodon well said! An interesting thought however. What's considered ethical scraping? All public data? No scraping at all?

Respect robots.txt?

I fully agree with you. Another issue is the lack of transparency from those who train. Its very unknown what data has been used or where it came from.

I'm not saying we shouldn't invest in AI. But the current form isnt ethical.

tmw@ioc.exchange

@watchfulcitizen have you ever created a robots.txt? when you did it, were you thinking it granted anybody a license to steal and regurgitate your content in the form of untraceable homunculus bullshit, in many cases profiting from it?

and i'll say it for you: i don't think we should invest in AI until we can figure out what the hell is going on

viq@social.hackerspace.pl

@watchfulcitizen
"AI" is currently a useless marketing term, lumping together very different technologies, with very different properties, and implying that just because one of them is useful for a thing, so are the LLMs that everyone lost their minds about. And in either case, nowhere is there any Intelligence to be found.
So, right now I very much AM saying, that we should NOT invest in "AI".
@Em0nM4stodon

guillaumerossolini@infosec.exchange

@watchfulcitizen that’s easy, it’s scraping of sources that have pre approved this use, and all the big ones have this kind of agreement (often for a fee)

But of course can you trust them to include only data that was contributed with the same agreement, that’s tougher

I’m thinking about the crowdsourced Japanese translation of this Mozilla thing, can’t remember the details, they bailed recently and withdrew their contributions when it comes to LLM fair use

@Em0nM4stodon

phil@fed.bajsicki.com

@Em0nM4stodon@infosec.exchange @awalter@mastodon.bawue.social
Does that affect the user's privacy?

The LLMs I run locally aren't capable of connecting to the web, so everything I process using them remains on my device.

I generally agree the companies producing these models aren't privacy-respecting (insofar as they wish to avoid being fined out the arse for GDPR breaches).

I disagree that LLMs themselves are intrinsically incompatible with privacy (please correct me if that's not the intent of your post - that's what I got from it).

It's a matter of implementation; when running on my own computer, it's entirely private as far as I'm concerned.

When using a vendor, the same truths apply as when running any software on someone else's computer. It's just not private at all.

I really don't see why you're hyper-focusing on the LLM part, when the larger privacy invasion is in advertising/ nation-state surveillance. (Have you seen Benn Jordan's video on Flock?)

LLMs are an ecological, creative and intellectual disaster, but the privacy concerns are hardly worth mentioning in comparison to pre-existing threats.

On a different note, have you checked out Olmo? That's very much a privacy-respecting LLM: https://huggingface.co/allenai/Olmo-3.1-32B-Think

watchfulcitizen@goingdark.social

@tmw not saying I agree with how they handle it. Just want to state that data that are on the public web will always be at a risk of misuse.

Sadly I have a hard time seeing them stopping whatever we like it or not

watchfulcitizen@goingdark.social

@viq @Em0nM4stodon I agree that the term is very loosely used. Is my vacum "AI" no it is not

watchfulcitizen@goingdark.social

@GuillaumeRossolini @Em0nM4stodon is there a global standard to approve this kind of use case? Asking as I have no knowledge on the subject and would love to learn more.

guillaumerossolini@infosec.exchange

@watchfulcitizen sure, there are several as you might expect, with various degrees of usefulness and no way to enforce any of them

@Em0nM4stodon

viq@social.hackerspace.pl

@watchfulcitizen @Em0nM4stodon I think with how currently the term is used, it might be

pip@infosec.exchange

@Em0nM4stodon Ask not whether fashtech is private. Ask why anyone is using fashtech.

crazyeddie@mastodon.social

@Em0nM4stodon "Users thinking they are using a privacy-respectful platform are in fact saying:

"Privacy for me and not for thee,""

Which is pretty short sighted since they're probably not using that particular platform 24x7, and that makes them fair game for all the other time.

hyperreal@tilde.zone

@Em0nM4stodon GenAI is fundamentally and inherently built to be exploitative. Even if running locally and trained on "consensual" data. You can't build a language model / AI without the ability to exploit humans.

martinrust@infosec.exchange

@Em0nM4stodon phew, I admire you for mastering four negations in one sentence (the first one) – I just cannot, so I tried to understand it by eliminating the negations, hope I'm not distorting your idea with this:
"The only AI tool ever that
is truly private will be trained on consensual data only."
And, yes, I fully agree, any "filters" applied to an ML model as an afterthought are doomed to have leaks that someone, something will find.

em0nm4stodon@infosec.exchange

@martinrust Hahaha I didn't even realize I could have written this in a simpler way.

But yes! You understood it correctly! Only an AI tool trained solely on data obtained ethically (therefore, with consent) could be considered truly private (aka, respecting people's privacy, which also means respecting people's consent if it used their data)

CIRCLE WITH A DOT

There will never be an AI tool that is truly private unless it hasn't trained on nonconsensual data.