Brew (#forgejo), as usual, is being overloaded by the scrapers.
-
@stefano Have you tried blocking the IP ranges in Julian's list?
Julian Oliver (@JulianOliver@mastodon.social)
I've done the log analysis and the two biggest contributors that brought the AI crawler hits up to 2 million in a day, a 4x increase on a week prior, are ByteSpider (Singapore networks) and especially AppleBot (used for Siri and other Apple products). The parasites.txt is now >4500 lines long: https://scienceispoetry.net/files/parasites.txt
Mastodon (mastodon.social)
Or is it all coming from residential proxies?

@pertho residential proxies. I blocked everything I could block, but it wasn't enough.
-
@stefano Have you tried blocking the IP ranges in Julian's list?
Julian Oliver (@JulianOliver@mastodon.social)
I've done the log analysis and the two biggest contributors that brought the AI crawler hits up to 2 million in a day, a 4x increase on a week prior, are ByteSpider (Singapore networks) and especially AppleBot (used for Siri and other Apple products). The parasites.txt is now >4500 lines long: https://scienceispoetry.net/files/parasites.txt
Mastodon (mastodon.social)
Or is it all coming from residential proxies?
@pertho @stefano oooh interesting list.
I've been tinkering with ssh/httpd logs/awk and enriching the data with https://iplocate.io/ and maybe eventually greynoise and spamhaus (to get more residential proxies etc) -
EDIT: done, let me know if you experience problems
Brew (#forgejo), as usual, is being overloaded by the scrapers.
I think I'll have to put an Anubis in front of it. I don't love those "blocks", but sometimes you need to.
@stefano Stick Bunny in with origin shield to drop the nuffs scraping your site -
@stefano Stick Bunny in with origin shield to drop the nuffs scraping your site
@tubsta this would work, but I'm trying to avoid using (external) CDNs, at the moment.
-
@tubsta this would work, but I'm trying to avoid using (external) CDNs, at the moment.
@stefano I agree with what you are trying to do as I would rather avoid CDNs but some services need it, just gotta work out the least shit ones and the ones that are Europe focused to assist here. -
@stefano I agree with what you are trying to do as I would rather avoid CDNs but some services need it, just gotta work out the least shit ones and the ones that are Europe focused to assist here.@stefano FWIW I spend about $5 a month with Bunny’s CDN products for bsdlab
-
@stefano FWIW I spend about $5 a month with Bunny’s CDN products for bsdlab
-
@stefano I agree with what you are trying to do as I would rather avoid CDNs but some services need it, just gotta work out the least shit ones and the ones that are Europe focused to assist here.
@tubsta sure. Bunny is great. I have an account and use it for some services. For some time, some of the BSD Cafe contents were served by them, and it was perfect.
-
@tubsta sure. Bunny is great. I have an account and use it for some services. For some time, some of the BSD Cafe contents were served by them, and it was perfect.
@stefano They now have S3 object access for their storage nodes which has been a long time coming -
@stefano They now have S3 object access for their storage nodes which has been a long time coming
@tubsta oh nice! I was curious to see it implemented. I'll have a look.
