Pleased to share a page and explainer for the AI tarpit project Science is Poetry, with legal statement, rationale(s), and a few deployment notes:
-
It's approaching DoS at this point. This just one of the VMs, and just OpenAI's parasite.
Threading's holding up but need some more tuning of rate limits and burst. Trying sending 429's now to ask them to play nice.
To think the www was built for people.
And here we are
@JulianOliver Seriously tempted to build an installation piece based on this, where the machine is a working tarpit, and a large display visualizes the activity of AI crawlers request by request. Possibly with sonification too.
-
@JulianOliver Seriously tempted to build an installation piece based on this, where the machine is a working tarpit, and a large display visualizes the activity of AI crawlers request by request. Possibly with sonification too.
@twilliability Yes I was thinking of just the same, dual projection.
-
@twilliability Yes I was thinking of just the same, dual projection.
@twilliability Relatedly, I'm working on a means to capture the shell log output to a streaming endpoint while allowing plenty of bandwidth for existing bot traffic. Not as easy at it may seem!
-
@twilliability Relatedly, I'm working on a means to capture the shell log output to a streaming endpoint while allowing plenty of bandwidth for existing bot traffic. Not as easy at it may seem!
@twilliability P.S. I was not considering sonification, rather just a projection piece.
IMO while plugging into Pure Data, Supercollider etc might seem interesting I honestly think that the rate a properly setup tarpit works you'd practically end up with gabba, or something akin to barely textured noise. If you were monitoring TCP traffic directly, sonifying on Layer 4 or even Layer 3 giving auditory identity to endpoint IPs, it would be pretty intense!
-
Do you have an unused domain that you would be happy to donate to a counter-offensive against unchecked & unregulated AI crawlers that scrape human-made content to simulate & deceive for profit?
If so, pls reply to this post. Your domain would become an entrypoint to the AI tarpit & Poison-as-a-Service project below, allowing concerned public to choose to use it on their sites, helping make the project more resilient to blacklisting.
Cute idea.
Entirely useless.
Feed a "Ai trap" page to Ai and see what happens... -
@twilliability P.S. I was not considering sonification, rather just a projection piece.
IMO while plugging into Pure Data, Supercollider etc might seem interesting I honestly think that the rate a properly setup tarpit works you'd practically end up with gabba, or something akin to barely textured noise. If you were monitoring TCP traffic directly, sonifying on Layer 4 or even Layer 3 giving auditory identity to endpoint IPs, it would be pretty intense!
@JulianOliver yes it's at first idea stage
it is not beyond me to take liberties for an experience that is memorable but maybe not 1:1 with reality.screen is easier, lots of pixels
-
Even faster now.
Again, these pages are randomly generated, and each line is a page request from a crawler.
To think of the energy expended at a global scale, the waste. All the money, water & minerals thrown at this. These AI companies are near DoS'ing the human web as they deep-sea trawl our content.
Computationally, infrastructurally, & culturally, it's an obscenity,
Are you still looking for domains?
Somehow www.qaz.red is pointing at 95.216.76.85. Should I add an AAAA record, too?
-
@JulianOliver yes it's at first idea stage
it is not beyond me to take liberties for an experience that is memorable but maybe not 1:1 with reality.screen is easier, lots of pixels
@twilliability @JulianOliver I would still like to hear it. Maybe with headphones, so the room is not unbearable. Also, there are many ways to sonify it. It could sound like cockroaches walking on paper, for instance

-
@JulianOliver yes it's at first idea stage
it is not beyond me to take liberties for an experience that is memorable but maybe not 1:1 with reality.screen is easier, lots of pixels
@twilliability Hehe. Me too. You live coders are good at managing chaos, so perhaps you could find a way to tame it, or pick out certain outlier patterns from these vacuum cleaners. It's true looking at crawler operators with huge swarms that they do shift across IP ranges as they feed, so there's that to play with I guess. OpenAI and Amazon in particular.
You'd get a lot more perceptible detail if you could slow them down but my exp is that if you try & rate limit too much they lose interest!
-
Are you still looking for domains?
Somehow www.qaz.red is pointing at 95.216.76.85. Should I add an AAAA record, too?
@elithebearded Oh hey thanks! I'll add it today. An AAAA would be great if you have a moment.
-
@elithebearded Oh hey thanks! I'll add it today. An AAAA would be great if you have a moment.
Done. Copied from tender.horse, if it matters
-
Done. Copied from tender.horse, if it matters
@elithebearded You are live and listed here
-
Do you have an unused domain that you would be happy to donate to a counter-offensive against unchecked & unregulated AI crawlers that scrape human-made content to simulate & deceive for profit?
If so, pls reply to this post. Your domain would become an entrypoint to the AI tarpit & Poison-as-a-Service project below, allowing concerned public to choose to use it on their sites, helping make the project more resilient to blacklisting.
@JulianOliver I think I might have one. Need to check, though.
-
@elithebearded You are live and listed here
-
Even faster now.
Again, these pages are randomly generated, and each line is a page request from a crawler.
To think of the energy expended at a global scale, the waste. All the money, water & minerals thrown at this. These AI companies are near DoS'ing the human web as they deep-sea trawl our content.
Computationally, infrastructurally, & culturally, it's an obscenity,
- Mum, if you made a chain out of all the endpoint addresses of AI crawlers, how far would it reach?
- All the way to the moon, darling. All the way to the moon.
-
- Mum, if you made a chain out of all the endpoint addresses of AI crawlers, how far would it reach?
- All the way to the moon, darling. All the way to the moon.
@JulianOliver indeed
-
I've started to harvest a list of AI crawler endpoint addrs for your blacklisting pleasure.
I'll try to keep it updated. I've been fastidious with ensuring I'm only pulling those related to the known user agent, so as not to have any false positives
https://scienceispoetry.net/files/parasites.txt
It is at the same path for all contributed domains.
For instance:
@JulianOliver Thanks is for this!
I added the list to my Crowdsec firewall bouncer, that should block them. Right?
-
@JulianOliver Thanks is for this!
I added the list to my Crowdsec firewall bouncer, that should block them. Right?
@jasperbuma It should indeed!
-
Do you have an unused domain that you would be happy to donate to a counter-offensive against unchecked & unregulated AI crawlers that scrape human-made content to simulate & deceive for profit?
If so, pls reply to this post. Your domain would become an entrypoint to the AI tarpit & Poison-as-a-Service project below, allowing concerned public to choose to use it on their sites, helping make the project more resilient to blacklisting.
@JulianOliver i could dedicate subdomains such as science.akselmo.dev to this. Just let me know how.
-
- Mum, if you made a chain out of all the endpoint addresses of AI crawlers, how far would it reach?
- All the way to the moon, darling. All the way to the moon.
Here's a thing I did in a couple of mins to ban all IPs in the parasites.txt serverside. You could ofc REJECT rather than DROP to send a message.
---
#!/bin/bashwhile read parasite;
do
if [[ "$parasite" == *"."* ]]; then
iptables -I INPUT -s "$parasite" -j DROP
elif [[ "$parasite" == *":"* ]]; then
ip6tables -I INPUT -s "$parasite" -j DROP
fi
done < /path/to/parasites.txt
---
