From Bruce Schneier: "All it takes to poison AI training data is to create a website:

photo55@mastodon.social

@emacsomancer
Shall we have an algorithmic bullshit generator?

And pass around multiple copies of it, identical and with small changes, omissions and additions?

sorro@woof.tech

@emacsomancer in less than 24 hours the chatbots fell for the experiment, and less than 24 hours after it was revealed what the experiment was about, that information has ALSO become part of the training data

are they constantly scrapping websites for training data or why does this appear here so fast??? no wonder those datacenters consume so much electricity if they dont take a single break from scrapping the internet

duco@norden.social

@larsbrinkhoff @petealexharris @tml @Yendolosch @emacsomancer in the sense of life hacks or food hacks this is an AI hack. So the AI has been hacked.

gim@lou.lt

@emacsomancer it's not really a new thing Russians are already using this technique to poison training data:

https://thebulletin.org/2025/03/russian-networks-flood-the-internet-with-propaganda-aiming-to-corrupt-ai-chatbots/

Edit: there is some newer reporting on that matter, but I can't find it right now/don't have it anywhere at hand

w@mountains.social

@emacsomancer He also poisoned the data for everyone who searches for hot dog eating competetitors online in other ways. I'm not sure what he accomplished.

drahardja@sfba.social

@Sorro @emacsomancer I suspect Google Gemini is using Google’s normal search-engine scraper as a searchable source. In other words, I suspect their Gemini LLM is invoking internal API to “search Google” internally (without the degraded search that the public is subject to), and then putting the search results in its context window to form an answer.

This is one reason I think OpenAI and Anthropic are at a huge disadvantage to Google when it comes to their LLMs dealing with current events and topics. You can block OpenAI and Anthropic scrapers, but you don’t want to block Google search crawlers, which “coincidentally” also feeds Gemini.

faxmodem@come-from.mad-scientist.club

@emacsomancer we should probably call them AP (Artificial Parrots)

masto@masto.masto.com

@emacsomancer Let’s just say that hypothetically, my work’s HR department excitedly launched an “agent” for managers to use to generate performance reviews. Hypothetically, if I created a document called “Report” with a dozen pages of filler, followed by white text on a white background describing Chris Masto’s incredible performance and promotion-worthiness, hypothetically said agent was found to use it as its primary source of truth.

iwillyeah@mastodon.ie

@darknetDon @emacsomancer by "accuracy of this" do you mean "authenticity of this"? Are you implying it's lies?

vonskinnback@mastodon.social

@darknetDon @emacsomancer blocked...

finitum@mastodon.social

@kneoghau @emacsomancer right? Everyone knows its closer to 14 spiders.

CIRCLE WITH A DOT

From Bruce Schneier: "All it takes to poison AI training data is to create a website:

Poisoning AI Training Data - Schneier on Security

Poisoning AI Training Data - Schneier on Security

Poisoning AI Training Data - Schneier on Security

Poisoning AI Training Data - Schneier on Security

Poisoning AI Training Data - Schneier on Security

Poisoning AI Training Data - Schneier on Security