Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Some updates on my website maintenance woes:

Some updates on my website maintenance woes:

Scheduled Pinned Locked Moved Uncategorized
scraperllmcloudfareaibubble
4 Posts 3 Posters 2 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • juergen_hubert@mementomori.socialJ This user is from outside of this forum
    juergen_hubert@mementomori.socialJ This user is from outside of this forum
    juergen_hubert@mementomori.social
    wrote last edited by
    #1

    Some updates on my website maintenance woes:

    Starting last July, I built a new wiki for my translations of German folk tales. And soon after I started doing so, it started to experience frequent, hours-long outages. I started to research possible causes, but eventually concluded that the primary cause were so many requests from anonymous #scraper bot networks deserpate for new scraps of data to feed into their #LLM models that the wiki simply couldn't cope. Even when I increased my hosting plan _twice_ last September, this only served to make the outages less common - not to stop them.

    In March, I drastically reduced the amount of work I did on the wiki, as it was functionally complete - I had added more than 700 folk tales to it by that stage. Sure, there are always further tales to add - I didn't stop translating those tales, after all. But now I am adding 10-20 tales per month, not 100+.

    And funnily enough, I haven't noticed any major outages for this past month - or even minor ones. I guess the scraper bot networks noticed that I don't have that much new data to steal, and largely moved on to new prey they can harass.

    So, what can we conclude from this?

    If you are maintaining a website that produces lots of new content on a regular basis, you _will_ get hammered by these scrapers. robots.txt will do nothing - these use anonymous, ever-changing IP addresses. Maybe you can thwart them with #Cloudfare or similar technologies which I haven't tried out (I am a rank beginner when it comes to website administration, to be frank).

    Otherwise you will either have to slow down the publication of new content, pay lots of money for an oversized hosting plan, or live with periodic outages until the #AIBubble bursts, and there is no longer a trillion dollar business case for scraping every website a thousand times a month.

    Link Preview Image
    Sunken Castles, Evil Poodles Wiki

    favicon

    (wiki.sunkencastles.com)

    devwouter@mastodon.socialD skysong@floss.socialS 2 Replies Last reply
    1
    0
    • System shared this topic
    • juergen_hubert@mementomori.socialJ juergen_hubert@mementomori.social

      Some updates on my website maintenance woes:

      Starting last July, I built a new wiki for my translations of German folk tales. And soon after I started doing so, it started to experience frequent, hours-long outages. I started to research possible causes, but eventually concluded that the primary cause were so many requests from anonymous #scraper bot networks deserpate for new scraps of data to feed into their #LLM models that the wiki simply couldn't cope. Even when I increased my hosting plan _twice_ last September, this only served to make the outages less common - not to stop them.

      In March, I drastically reduced the amount of work I did on the wiki, as it was functionally complete - I had added more than 700 folk tales to it by that stage. Sure, there are always further tales to add - I didn't stop translating those tales, after all. But now I am adding 10-20 tales per month, not 100+.

      And funnily enough, I haven't noticed any major outages for this past month - or even minor ones. I guess the scraper bot networks noticed that I don't have that much new data to steal, and largely moved on to new prey they can harass.

      So, what can we conclude from this?

      If you are maintaining a website that produces lots of new content on a regular basis, you _will_ get hammered by these scrapers. robots.txt will do nothing - these use anonymous, ever-changing IP addresses. Maybe you can thwart them with #Cloudfare or similar technologies which I haven't tried out (I am a rank beginner when it comes to website administration, to be frank).

      Otherwise you will either have to slow down the publication of new content, pay lots of money for an oversized hosting plan, or live with periodic outages until the #AIBubble bursts, and there is no longer a trillion dollar business case for scraping every website a thousand times a month.

      Link Preview Image
      Sunken Castles, Evil Poodles Wiki

      favicon

      (wiki.sunkencastles.com)

      devwouter@mastodon.socialD This user is from outside of this forum
      devwouter@mastodon.socialD This user is from outside of this forum
      devwouter@mastodon.social
      wrote last edited by
      #2

      @juergen_hubert

      You can prevent them in several ways.

      - Some http servers have a module that prevent hammering the internal systems.
      - Some firewalls can do the same on a OS level
      - You could go for DDoS solution such as cloudflare.

      What I would recommend for now is to lower the TTL of your DNS. That way when you do want to switch to cloudflare you don’t have to wait for an entire day.

      1 Reply Last reply
      0
      • juergen_hubert@mementomori.socialJ juergen_hubert@mementomori.social

        Some updates on my website maintenance woes:

        Starting last July, I built a new wiki for my translations of German folk tales. And soon after I started doing so, it started to experience frequent, hours-long outages. I started to research possible causes, but eventually concluded that the primary cause were so many requests from anonymous #scraper bot networks deserpate for new scraps of data to feed into their #LLM models that the wiki simply couldn't cope. Even when I increased my hosting plan _twice_ last September, this only served to make the outages less common - not to stop them.

        In March, I drastically reduced the amount of work I did on the wiki, as it was functionally complete - I had added more than 700 folk tales to it by that stage. Sure, there are always further tales to add - I didn't stop translating those tales, after all. But now I am adding 10-20 tales per month, not 100+.

        And funnily enough, I haven't noticed any major outages for this past month - or even minor ones. I guess the scraper bot networks noticed that I don't have that much new data to steal, and largely moved on to new prey they can harass.

        So, what can we conclude from this?

        If you are maintaining a website that produces lots of new content on a regular basis, you _will_ get hammered by these scrapers. robots.txt will do nothing - these use anonymous, ever-changing IP addresses. Maybe you can thwart them with #Cloudfare or similar technologies which I haven't tried out (I am a rank beginner when it comes to website administration, to be frank).

        Otherwise you will either have to slow down the publication of new content, pay lots of money for an oversized hosting plan, or live with periodic outages until the #AIBubble bursts, and there is no longer a trillion dollar business case for scraping every website a thousand times a month.

        Link Preview Image
        Sunken Castles, Evil Poodles Wiki

        favicon

        (wiki.sunkencastles.com)

        skysong@floss.socialS This user is from outside of this forum
        skysong@floss.socialS This user is from outside of this forum
        skysong@floss.social
        wrote last edited by
        #3

        @juergen_hubert Perhaps an alternative would be to write a book about it (or a series of books), rather than maintain a wiki?

        I don't see LLMs or their makers going away any time soon, unfortunately.

        juergen_hubert@mementomori.socialJ 1 Reply Last reply
        0
        • skysong@floss.socialS skysong@floss.social

          @juergen_hubert Perhaps an alternative would be to write a book about it (or a series of books), rather than maintain a wiki?

          I don't see LLMs or their makers going away any time soon, unfortunately.

          juergen_hubert@mementomori.socialJ This user is from outside of this forum
          juergen_hubert@mementomori.socialJ This user is from outside of this forum
          juergen_hubert@mementomori.social
          wrote last edited by
          #4

          @skysong

          Oh, I have a lengthy list of books I plan to publish, too.

          But the wiki is an excellent marketing tool for my work.

          1 Reply Last reply
          1
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups