There will never be an AI tool that is truly private unless it hasn't trained on nonconsensual data.
-
@Em0nM4stodon well said! An interesting thought however. What's considered ethical scraping? All public data? No scraping at all?
Respect robots.txt?
I fully agree with you. Another issue is the lack of transparency from those who train. Its very unknown what data has been used or where it came from.
I'm not saying we shouldn't invest in AI. But the current form isnt ethical.
@watchfulcitizen
"AI" is currently a useless marketing term, lumping together very different technologies, with very different properties, and implying that just because one of them is useful for a thing, so are the LLMs that everyone lost their minds about. And in either case, nowhere is there any Intelligence to be found.
So, right now I very much AM saying, that we should NOT invest in "AI".
@Em0nM4stodon -
@Em0nM4stodon well said! An interesting thought however. What's considered ethical scraping? All public data? No scraping at all?
Respect robots.txt?
I fully agree with you. Another issue is the lack of transparency from those who train. Its very unknown what data has been used or where it came from.
I'm not saying we shouldn't invest in AI. But the current form isnt ethical.
@watchfulcitizen that’s easy, it’s scraping of sources that have pre approved this use, and all the big ones have this kind of agreement (often for a fee)
But of course can you trust them to include only data that was contributed with the same agreement, that’s tougher
I’m thinking about the crowdsourced Japanese translation of this Mozilla thing, can’t remember the details, they bailed recently and withdrew their contributions when it comes to LLM fair use
-
@awalter Where does the data to train the LLM initially come from?
@Em0nM4stodon@infosec.exchange @awalter@mastodon.bawue.social
Does that affect the user's privacy?
The LLMs I run locally aren't capable of connecting to the web, so everything I process using them remains on my device.
I generally agree the companies producing these models aren't privacy-respecting (insofar as they wish to avoid being fined out the arse for GDPR breaches).
I disagree that LLMs themselves are intrinsically incompatible with privacy (please correct me if that's not the intent of your post - that's what I got from it).
It's a matter of implementation; when running on my own computer, it's entirely private as far as I'm concerned.
When using a vendor, the same truths apply as when running any software on someone else's computer. It's just not private at all.
I really don't see why you're hyper-focusing on the LLM part, when the larger privacy invasion is in advertising/ nation-state surveillance. (Have you seen Benn Jordan's video on Flock?)
LLMs are an ecological, creative and intellectual disaster, but the privacy concerns are hardly worth mentioning in comparison to pre-existing threats.
On a different note, have you checked out Olmo? That's very much a privacy-respecting LLM: https://huggingface.co/allenai/Olmo-3.1-32B-Think -
@watchfulcitizen have you ever created a robots.txt? when you did it, were you thinking it granted anybody a license to steal and regurgitate your content in the form of untraceable homunculus bullshit, in many cases profiting from it?
and i'll say it for you: i don't think we should invest in AI until we can figure out what the hell is going on
@tmw not saying I agree with how they handle it. Just want to state that data that are on the public web will always be at a risk of misuse.
Sadly I have a hard time seeing them stopping whatever we like it or not
-
@watchfulcitizen
"AI" is currently a useless marketing term, lumping together very different technologies, with very different properties, and implying that just because one of them is useful for a thing, so are the LLMs that everyone lost their minds about. And in either case, nowhere is there any Intelligence to be found.
So, right now I very much AM saying, that we should NOT invest in "AI".
@Em0nM4stodon@viq @Em0nM4stodon I agree that the term is very loosely used. Is my vacum "AI" no it is not
-
@watchfulcitizen that’s easy, it’s scraping of sources that have pre approved this use, and all the big ones have this kind of agreement (often for a fee)
But of course can you trust them to include only data that was contributed with the same agreement, that’s tougher
I’m thinking about the crowdsourced Japanese translation of this Mozilla thing, can’t remember the details, they bailed recently and withdrew their contributions when it comes to LLM fair use
@GuillaumeRossolini @Em0nM4stodon is there a global standard to approve this kind of use case? Asking as I have no knowledge on the subject and would love to learn more.
-
@GuillaumeRossolini @Em0nM4stodon is there a global standard to approve this kind of use case? Asking as I have no knowledge on the subject and would love to learn more.
@watchfulcitizen sure, there are several as you might expect, with various degrees of usefulness and no way to enforce any of them
-
@viq @Em0nM4stodon I agree that the term is very loosely used. Is my vacum "AI" no it is not
@watchfulcitizen @Em0nM4stodon I think with how currently the term is used, it might be

-
There will never be an AI tool that
is truly private unless it hasn't trained on nonconsensual data.Even if a platform were able to
create the perfect protections for its users' prompts and results,If the platform is built from or utilizing an AI model that was trained on or is updated and optimized with data that was scraped from millions of people without their consent, then of course this platform isn't "privacy-respectful."
How could it be?
The company is saying:
"We respect the privacy of our users while they are using our platform, but outside of it, it's fair game."Users thinking they are using a privacy-respectful platform are in fact saying:
"Privacy for me and not for thee,"
And are directly contributing to the platform needing to scrape even more nonconsensual data to improve.
Always ask: Where the training data comes from?
Without the assurance that a platform only uses AI models that have only been training on data acquired ethically, it is not a privacy-respectful platform.
@Em0nM4stodon Ask not whether fashtech is private. Ask why anyone is using fashtech.
-
There will never be an AI tool that
is truly private unless it hasn't trained on nonconsensual data.Even if a platform were able to
create the perfect protections for its users' prompts and results,If the platform is built from or utilizing an AI model that was trained on or is updated and optimized with data that was scraped from millions of people without their consent, then of course this platform isn't "privacy-respectful."
How could it be?
The company is saying:
"We respect the privacy of our users while they are using our platform, but outside of it, it's fair game."Users thinking they are using a privacy-respectful platform are in fact saying:
"Privacy for me and not for thee,"
And are directly contributing to the platform needing to scrape even more nonconsensual data to improve.
Always ask: Where the training data comes from?
Without the assurance that a platform only uses AI models that have only been training on data acquired ethically, it is not a privacy-respectful platform.
@Em0nM4stodon "Users thinking they are using a privacy-respectful platform are in fact saying:
"Privacy for me and not for thee,""
Which is pretty short sighted since they're probably not using that particular platform 24x7, and that makes them fair game for all the other time.
-
E em0nm4stodon@infosec.exchange shared this topic
-
There will never be an AI tool that
is truly private unless it hasn't trained on nonconsensual data.Even if a platform were able to
create the perfect protections for its users' prompts and results,If the platform is built from or utilizing an AI model that was trained on or is updated and optimized with data that was scraped from millions of people without their consent, then of course this platform isn't "privacy-respectful."
How could it be?
The company is saying:
"We respect the privacy of our users while they are using our platform, but outside of it, it's fair game."Users thinking they are using a privacy-respectful platform are in fact saying:
"Privacy for me and not for thee,"
And are directly contributing to the platform needing to scrape even more nonconsensual data to improve.
Always ask: Where the training data comes from?
Without the assurance that a platform only uses AI models that have only been training on data acquired ethically, it is not a privacy-respectful platform.
@Em0nM4stodon GenAI is fundamentally and inherently built to be exploitative. Even if running locally and trained on "consensual" data. You can't build a language model / AI without the ability to exploit humans.
-
There will never be an AI tool that
is truly private unless it hasn't trained on nonconsensual data.Even if a platform were able to
create the perfect protections for its users' prompts and results,If the platform is built from or utilizing an AI model that was trained on or is updated and optimized with data that was scraped from millions of people without their consent, then of course this platform isn't "privacy-respectful."
How could it be?
The company is saying:
"We respect the privacy of our users while they are using our platform, but outside of it, it's fair game."Users thinking they are using a privacy-respectful platform are in fact saying:
"Privacy for me and not for thee,"
And are directly contributing to the platform needing to scrape even more nonconsensual data to improve.
Always ask: Where the training data comes from?
Without the assurance that a platform only uses AI models that have only been training on data acquired ethically, it is not a privacy-respectful platform.
@Em0nM4stodon phew, I admire you for mastering four negations in one sentence (the first one) – I just cannot, so I tried to understand it by eliminating the negations, hope I'm not distorting your idea with this:
"The only AI tool ever that
is truly private will be trained on consensual data only."
And, yes, I fully agree, any "filters" applied to an ML model as an afterthought are doomed to have leaks that someone, something will find. -
@Em0nM4stodon phew, I admire you for mastering four negations in one sentence (the first one) – I just cannot, so I tried to understand it by eliminating the negations, hope I'm not distorting your idea with this:
"The only AI tool ever that
is truly private will be trained on consensual data only."
And, yes, I fully agree, any "filters" applied to an ML model as an afterthought are doomed to have leaks that someone, something will find.@martinrust Hahaha I didn't even realize
I could have written this in a simpler way. But yes! You understood it correctly!
Only an AI tool trained solely on data obtained ethically (therefore, with consent) could be considered truly private (aka, respecting people's privacy, which also means respecting people's consent if it used their data) 