<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level.]]></title><description><![CDATA[<p>I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. And I think it's something that precludes the use of caching, but there are probably many of you with more knowledge than I have and who may know what can be done</p><p>🧵<img src="https://board.circlewithadot.net/assets/plugins/nodebb-plugin-emoji/emoji/android/2935.png?v=28325c671da" class="not-responsive emoji emoji-android emoji--arrow_heading_down" style="height:23px;width:auto;vertical-align:middle" title="⤵" alt="⤵" />️</p><p><a href="https://mastodon.art/tags/IndieWeb" rel="tag">#<span>IndieWeb</span></a> <a href="https://mastodon.art/tags/WebDev" rel="tag">#<span>WebDev</span></a> <a href="https://mastodon.art/tags/PersonalSite" rel="tag">#<span>PersonalSite</span></a></p>]]></description><link>https://board.circlewithadot.net/topic/cf8523fc-acec-47da-9057-83f99ccbd33e/i-have-an-obnoxious-problem-with-crawlers-eating-bandwidth-on-my-personal-web-site-not-just-the-fact-that-crawlers-consume-so-much-bandwidth-but-rather-a-behaviour-that-is-absolutely-next-level.</link><generator>RSS for Node</generator><lastBuildDate>Fri, 15 May 2026 02:34:13 GMT</lastBuildDate><atom:link href="https://board.circlewithadot.net/topic/cf8523fc-acec-47da-9057-83f99ccbd33e.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 23 Apr 2026 15:29:44 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:32:45 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> <span><a href="/user/redstrate%40mastoart.social">@<span>redstrate</span></a></span> </p><p>The problem is that lots of crawlers do not respect robots.txt (especially those run by "AI" companies).</p><p>Thus people go for other solutions, to make it too expensive on the side of the crawler, like iocaine - <a href="https://firesphere.dev/articles/iocaine-the-deadliest-poison-known-to-ai" rel="nofollow noopener"><span>https://</span><span>firesphere.dev/articles/iocain</span><span>e-the-deadliest-poison-known-to-ai</span></a>, or anubis - <a href="https://anubis.techaro.lol" rel="nofollow noopener"><span>https://</span><span>anubis.techaro.lol</span><span></span></a></p>]]></description><link>https://board.circlewithadot.net/post/https://mammut.moe/users/gemelen/statuses/116454979391320053</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mammut.moe/users/gemelen/statuses/116454979391320053</guid><dc:creator><![CDATA[gemelen@mammut.moe]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:32:45 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:31:30 GMT]]></title><description><![CDATA[<p><span><a href="/user/cb%40boop.bleepbop.space">@<span>cb</span></a></span> I also use the .htaccess method to "block" specific agents, so they simply get thousands of 0 byte responses. Whenever it's a known LLM/AI scraper, I'm happy with that solution (and IP blocking ones that don't present a unique user agent).</p><p>I've heard of Iocane and similar tools but never looked into them, and I guess now is the time!</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454974463976893</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454974463976893</guid><dc:creator><![CDATA[jsstaedtler@mastodon.art]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:31:30 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:27:15 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> ah okay. i imagine that probably limits you from any making any apache/nginx configuration settings changes (e.g. IP blocklists)</p><p>i'm not familiar with your site generation code - but if you wrote it yourself, i *think* the trick would be to have it 404 when an incorrect tag has been used</p><p><div class="card col-md-9 col-lg-6 position-relative link-preview p-0">



<a href="https://stackoverflow.com/questions/1381123/how-to-create-an-error-404-page-using-php" title="How to create an error 404 page using PHP?">
<img src="https://stackoverflow.com/Content/Sites/stackoverflow/Img/apple-touch-icon@2.png?v=0f0cab681579" class="card-img-top not-responsive" style="max-height: 15rem;" alt="Link Preview Image" />
</a>



<div class="card-body">
<h5 class="card-title">
<a href="https://stackoverflow.com/questions/1381123/how-to-create-an-error-404-page-using-php">
How to create an error 404 page using PHP?
</a>
</h5>
<p class="card-text line-clamp-3">My file .htaccess handles all requests from /word_here to my internal endpoint /page.php?name=word_here. The PHP script then checks if the requested page is in its array of pages.
If not, how can I </p>
</div>
<a href="https://stackoverflow.com/questions/1381123/how-to-create-an-error-404-page-using-php" class="card-footer text-body-secondary small d-flex gap-2 align-items-center lh-2">



<img src="https://stackoverflow.com/Content/Sites/stackoverflow/Img/favicon.ico?v=562fb39d93c8" alt="favicon" class="not-responsive overflow-hiddden" style="max-width: 21px; max-height: 21px;" />







<p class="d-inline-block text-truncate mb-0">Stack Overflow <span class="text-secondary">(stackoverflow.com)</span></p>
</a>
</div></p><p>at least then the script can die() instead of yielding output. it's anyone's guess if the crawler will still continue to try generating tags when it has encountered a 404, but i *assume* they're built to avoid 404s</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.tomodori.net/users/vga256/statuses/116454957722516823</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.tomodori.net/users/vga256/statuses/116454957722516823</guid><dc:creator><![CDATA[vga256@mastodon.tomodori.net]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:27:15 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:26:47 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> This, I think, is why so many people have moved to having Cloudflare in front of their sites. To block/limit badly behaved bots.</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.social/users/foobarsoft/statuses/116454955918152347</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.social/users/foobarsoft/statuses/116454955918152347</guid><dc:creator><![CDATA[foobarsoft@mastodon.social]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:26:47 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:25:32 GMT]]></title><description><![CDATA[<p><span><a href="/user/redstrate%40mastoart.social">@<span>redstrate</span></a></span> Ah, this sounds promising! I don't want to make my site invisible on the greater Web by blocking all bot crawlers, but I'd be fine with them only loading URLs with no queries/parameters (anything after a ?). I'll look into that meta tag, though I acknowledge the other reply here that bots can happily ignore that.</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454951001010507</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454951001010507</guid><dc:creator><![CDATA[jsstaedtler@mastodon.art]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:25:32 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:24:55 GMT]]></title><description><![CDATA[<p><span><a href="/user/redstrate%40mastoart.social">@<span>redstrate</span></a></span> <span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> </p><p>Ah yes, worth doing as it also improves your SEO by not having thousands of similar pages</p>]]></description><link>https://board.circlewithadot.net/post/https://hachyderm.io/users/rubenwardy/statuses/116454948541751353</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://hachyderm.io/users/rubenwardy/statuses/116454948541751353</guid><dc:creator><![CDATA[rubenwardy@hachyderm.io]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:24:55 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:23:52 GMT]]></title><description><![CDATA[<p><span><a href="/user/rubenwardy%40hachyderm.io">@<span>rubenwardy</span></a></span> <span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> it would at least help with the legitimate ones!</p>]]></description><link>https://board.circlewithadot.net/post/https://mastoart.social/ap/users/116098485566595478/statuses/116454944442750746</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastoart.social/ap/users/116098485566595478/statuses/116454944442750746</guid><dc:creator><![CDATA[redstrate@mastoart.social]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:23:52 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:23:48 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> </p><p>To block the abusive subnets, I used this tool to look up the IP ranges from example IP addresses. You can see all the IP ranges for a particular host:  <a href="https://www.whatismyip.com/asn/AS150436/" rel="nofollow noopener"><span>https://www.</span><span>whatismyip.com/asn/AS150436/</span><span></span></a></p><p>I then blocked using ipset/iptables but other options exist depending on your setup</p>]]></description><link>https://board.circlewithadot.net/post/https://hachyderm.io/users/rubenwardy/statuses/116454944192701274</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://hachyderm.io/users/rubenwardy/statuses/116454944192701274</guid><dc:creator><![CDATA[rubenwardy@hachyderm.io]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:23:48 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:21:06 GMT]]></title><description><![CDATA[<p><span><a href="/user/vga256%40mastodon.tomodori.net">@<span>vga256</span></a></span> I'm sharing a paid host with a friend. Thanks to relatively low combined popularity, we can get away with a cheap plan, but I really don't want random bots to ruin that</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454933577456727</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454933577456727</guid><dc:creator><![CDATA[jsstaedtler@mastodon.art]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:21:06 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:18:16 GMT]]></title><description><![CDATA[<p><span><a href="/user/redstrate%40mastoart.social">@<span>redstrate</span></a></span> <span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> </p><p>Many crawlers ignore this in my experience, especially the AI ones</p>]]></description><link>https://board.circlewithadot.net/post/https://hachyderm.io/users/rubenwardy/statuses/116454922451234285</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://hachyderm.io/users/rubenwardy/statuses/116454922451234285</guid><dc:creator><![CDATA[rubenwardy@hachyderm.io]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:18:16 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:11:59 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> a dumb solution would be to tell robots to not index the page (robots meta tag) if there is any tag queries, which i assume you can do via PHP. </p><p>edit: or if you want individual tags indexed, at least reject robots for queries of more than one tag?</p>]]></description><link>https://board.circlewithadot.net/post/https://mastoart.social/ap/users/116098485566595478/statuses/116454897739267808</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastoart.social/ap/users/116098485566595478/statuses/116454897739267808</guid><dc:creator><![CDATA[redstrate@mastoart.social]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:11:59 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:09:38 GMT]]></title><description><![CDATA[<p>I'm not fundamentally opposed to web crawlers, I would actually love it if my work is more discoverable. But this is such an obnoxious situation that I'm forced to accomodate or protect against. </p><p>I'm starting to think I need to test for mutually exclusive tags, and if two or more are selected, the resulting page will have no links at all except one to go back. That will deny the bots any more links to dive into. </p><p>But maybe there are better options? I'd wager this is not a novel issue...</p><p>🧵7/7</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454888472936907</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454888472936907</guid><dc:creator><![CDATA[jsstaedtler@mastodon.art]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:09:38 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:07:56 GMT]]></title><description><![CDATA[<p><span><a href="/user/oblomov%40sociale.network">@<span>oblomov</span></a></span><span> </span><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span><span> the referer header only exists for tracking, so many privacy-conscious people configure their browsers not to send it<br /><br />the referer header should not exist in the first place</span></p>]]></description><link>https://board.circlewithadot.net/post/https://snug.moe/notes/alf8p8cqyl8kdhfb</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://snug.moe/notes/alf8p8cqyl8kdhfb</guid><dc:creator><![CDATA[lumi@snug.moe]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:07:56 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:05:11 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> (talking from experience with my self-hosted gitweb for this, BTW)</p>]]></description><link>https://board.circlewithadot.net/post/https://sociale.network/users/oblomov/statuses/116454870950098483</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://sociale.network/users/oblomov/statuses/116454870950098483</guid><dc:creator><![CDATA[oblomov@sociale.network]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:05:11 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:03:57 GMT]]></title><description><![CDATA[<p>How many permutations of tags are there? A butttonne, and the bot will diligently check out ALL OF THEM. Thousands and thousands of page loads! And even though all of them have 0 images to display, there will still be a tag list to choose from, and it will always visually update to indicate which tags are currently selected. So the page can't just be saved in a static HTML file, and the bot isn't going to load anything from it's own cache.</p><p>🧵6/?</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454866135062710</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454866135062710</guid><dc:creator><![CDATA[jsstaedtler@mastodon.art]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:03:57 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 16:01:53 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> </p><p>For your particular case, you should return a 404 if the URL contains both 2025 and 2026. This would stop them getting into invalid combinations. You can make it so the UI never links to these combinations by *replacing* rather than appending years if one already exists</p>]]></description><link>https://board.circlewithadot.net/post/https://hachyderm.io/users/rubenwardy/statuses/116454857976685502</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://hachyderm.io/users/rubenwardy/statuses/116454857976685502</guid><dc:creator><![CDATA[rubenwardy@hachyderm.io]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:01:53 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 15:59:05 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> an easy way to catch this is that these scrapers generally don't send Referer headers, so you can kill these by checking that a valid Referer header is present in tag search. This will have false positives for humans that try to be too smart though.</p>]]></description><link>https://board.circlewithadot.net/post/https://sociale.network/users/oblomov/statuses/116454846959754855</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://sociale.network/users/oblomov/statuses/116454846959754855</guid><dc:creator><![CDATA[oblomov@sociale.network]]></dc:creator><pubDate>Thu, 23 Apr 2026 15:59:05 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 15:56:11 GMT]]></title><description><![CDATA[<p>So it loads my gallery page, and sees the list of tags: maybe 50 different links, all of which load the gallery page with a new filter applied. So it loads one, like "?tag=2026".</p><p>On the resulting page, there are still 50-odd tag links available. So it loads another one, and the URL now includes "?tag=2026%2C2025". Which is nonsense, but the page still loads.</p><p>Well, there are 0 images to show on that page, but still more tags to open! So next the bot opens "?tag=2026%2C2025%2C2024"...</p><p>🧵5/?</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454835560131654</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454835560131654</guid><dc:creator><![CDATA[jsstaedtler@mastodon.art]]></dc:creator><pubDate>Thu, 23 Apr 2026 15:56:11 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 15:50:53 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art" rel="nofollow noreferrer noopener">@<span>jsstaedtler</span></a></span> I've been using Iocaine, which is specifically intended to mess with AI bots, but it can also help with "normal" bots too<br /><a href="https://iocaine.madhouse-project.org/" rel="nofollow noreferrer noopener">https://iocaine.madhouse-project.org/</a><br /><br />of course that still eats up some of your server's power. I work for a web hosting company and frequently we'll just make a list of "bad bots" in an .htaccess file to block them. The server still has to reply to their requests but doesn't have to serve them any real data</p>]]></description><link>https://board.circlewithadot.net/post/https://boop.bleepbop.space/users/cb/statuses/01KPXGJXTT2ZSW5SR07RESEXDS</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://boop.bleepbop.space/users/cb/statuses/01KPXGJXTT2ZSW5SR07RESEXDS</guid><dc:creator><![CDATA[cb@boop.bleepbop.space]]></dc:creator><pubDate>Thu, 23 Apr 2026 15:50:53 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 15:50:48 GMT]]></title><description><![CDATA[<p>But you can't stop anyone from entering a URL with any combination of tag names. You must decide what page they will see when they do so, and in my case, it's a gallery page with 0 images.</p><p>Now: enter the web crawler bot. It finds my site. It grabs all of the links on the front page, then starts loading each one. Then it grabs all of the links on *those* pages, and starts loading all of *them*. Presumably it will stop once all links have been viewed.</p><p>🧵4/?</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454814421944360</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454814421944360</guid><dc:creator><![CDATA[jsstaedtler@mastodon.art]]></dc:creator><pubDate>Thu, 23 Apr 2026 15:50:48 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 15:45:55 GMT]]></title><description><![CDATA[<p>This method of selecting tags allows for invalid combos, like "?tag=2026%2C2025". That selects images that were drawn both in 2026 *and* 2025... which obviously can't exist! The resulting page will tell you that 0 images were found.</p><p>A human would generally make sense of the available options, and *not* select two different years simultaneously. I could even code the page so that if one year is already selected, you can't select another one.</p><p>🧵3/?</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454795226146155</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454795226146155</guid><dc:creator><![CDATA[jsstaedtler@mastodon.art]]></dc:creator><pubDate>Thu, 23 Apr 2026 15:45:55 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 15:41:40 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> </p><p>The exact thing has happened to me recently with the tags. I now require users to log in to filter by multiple tags and I've blocked the subnets of the bots</p><p>If I wanted to allow guest users to search by multiple tags, I'd probably try the following options - (1) changing it to a POST request  (2) requiring JavaScript (3) using Anubis (4) looking into ip masked rate limiting, so a rate limit for like multiple ip addresses in the same block</p><p>I wrote a blog post about my situation here <a href="https://blog.rubenwardy.com/2026/04/16/contentdb-ddos/" rel="nofollow noopener"><span>https://</span><span>blog.rubenwardy.com/2026/04/16</span><span>/contentdb-ddos/</span></a></p>]]></description><link>https://board.circlewithadot.net/post/https://hachyderm.io/users/rubenwardy/statuses/116454778491689264</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://hachyderm.io/users/rubenwardy/statuses/116454778491689264</guid><dc:creator><![CDATA[rubenwardy@hachyderm.io]]></dc:creator><pubDate>Thu, 23 Apr 2026 15:41:40 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 15:38:28 GMT]]></title><description><![CDATA[<p>My site uses PHP to produce HTML output—no flat HTML files. There is an image gallery, and the images have tags.</p><p>It's here if you actually want to see it: <a href="https://bigraccoon.ca/gallery" rel="nofollow noopener"><span>https://</span><span>bigraccoon.ca/gallery</span><span></span></a></p><p>When you filter on a tag, it adds a parameter to the URL, e.g. "domain[dot]com?tag=2026". That loads the gallery, but only displays images tagged with "2026".</p><p>You can filter further on more tags, e.g. "?tag=2026%2CPencil" ("%2C" is a URL-encoded comma), which would show images from 2026 drawn in pencil.</p><p>🧵2/?</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454765916672788</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.art/users/jsstaedtler/statuses/116454765916672788</guid><dc:creator><![CDATA[jsstaedtler@mastodon.art]]></dc:creator><pubDate>Thu, 23 Apr 2026 15:38:28 GMT</pubDate></item><item><title><![CDATA[Reply to I have an obnoxious problem with crawlers eating bandwidth on my personal web site—not just the fact that crawlers consume so much bandwidth, but rather a behaviour that is absolutely next-level. on Thu, 23 Apr 2026 15:33:42 GMT]]></title><description><![CDATA[<p><span><a href="/user/jsstaedtler%40mastodon.art">@<span>jsstaedtler</span></a></span> I can't remember - are you self-hosting or using a paid host?</p>]]></description><link>https://board.circlewithadot.net/post/https://mastodon.tomodori.net/users/vga256/statuses/116454747176286272</link><guid isPermaLink="true">https://board.circlewithadot.net/post/https://mastodon.tomodori.net/users/vga256/statuses/116454747176286272</guid><dc:creator><![CDATA[vga256@mastodon.tomodori.net]]></dc:creator><pubDate>Thu, 23 Apr 2026 15:33:42 GMT</pubDate></item></channel></rss>