Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

Scheduled Pinned Locked Moved Uncategorized
selfhostingiocaineindieweb
57 Posts 27 Posters 2 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • soblow@eldritch.cafeS soblow@eldritch.cafe

    If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

    To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

    Recently, they started flooding my VPS so much it started choking.
    If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

    This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

    Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

    Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

    #selfHosting #iocaine #indieWeb

    algernon@come-from.mad-scientist.clubA This user is from outside of this forum
    algernon@come-from.mad-scientist.clubA This user is from outside of this forum
    algernon@come-from.mad-scientist.club
    wrote on last edited by
    #4

    @Soblow Thank you for writing this, reading it has been very educational!

    I have a number of things in development that will make iocaine less of a pain, and more suitable for cases like yours - but... that takes a bit of time.

    And NSoE's documentation is a mess, indeed. My excuse is that it was never meant to be used by anyone else, it's something I wrote for me. But there was no other option for a long time, and even if iocaine3 has a built-in script now, that's not as good as NSoE (yet).

    I have plans to address that shortcoming, so there's an option that isn't NSoE, has useful, navigatable documentation that isn't written like a mad scientist's diary1.

    But all the issues you listed are valid, and you even highlighted shortcomings I wasn't aware of, and tricks I did not consider. Now I have more things to play with!


    1. That was highly amusing to read, and I chuckled. Thanks!2 ↩︎

    2. Yes, I know, it's not a praise, but... algernon looks at the domain he's tooting from... yeah. ↩︎

    soblow@eldritch.cafeS 1 Reply Last reply
    0
    • soblow@eldritch.cafeS soblow@eldritch.cafe

      If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

      To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

      Recently, they started flooding my VPS so much it started choking.
      If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

      This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

      Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

      Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

      #selfHosting #iocaine #indieWeb

      dj2mn@mastodon.sdf.orgD This user is from outside of this forum
      dj2mn@mastodon.sdf.orgD This user is from outside of this forum
      dj2mn@mastodon.sdf.org
      wrote on last edited by
      #5

      @Soblow OMG that's horrendous. I'm too scared to look at my (recently set up) nginx logs now.

      1 Reply Last reply
      0
      • soblow@eldritch.cafeS soblow@eldritch.cafe

        If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

        To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

        Recently, they started flooding my VPS so much it started choking.
        If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

        This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

        Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

        Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

        #selfHosting #iocaine #indieWeb

        dfxluna@transfem.socialD This user is from outside of this forum
        dfxluna@transfem.socialD This user is from outside of this forum
        dfxluna@transfem.social
        wrote on last edited by
        #6

        @Soblow@eldritch.cafe your prose is so clear and easy to follow! I love it :3

        1 Reply Last reply
        0
        • soblow@eldritch.cafeS soblow@eldritch.cafe

          If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

          To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

          Recently, they started flooding my VPS so much it started choking.
          If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

          This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

          Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

          Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

          #selfHosting #iocaine #indieWeb

          ottermatic@woof.groupO This user is from outside of this forum
          ottermatic@woof.groupO This user is from outside of this forum
          ottermatic@woof.group
          wrote on last edited by
          #7

          @Soblow thank you loved reading this.

          1 Reply Last reply
          0
          • soblow@eldritch.cafeS soblow@eldritch.cafe

            If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

            To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

            Recently, they started flooding my VPS so much it started choking.
            If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

            This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

            Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

            Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

            #selfHosting #iocaine #indieWeb

            khleedril@cyberplace.socialK This user is from outside of this forum
            khleedril@cyberplace.socialK This user is from outside of this forum
            khleedril@cyberplace.social
            wrote on last edited by
            #8

            @Soblow Owch. Painful long read, regurgitating the experience of a lot of tech administrators. I like the idea of poisoning AI, but giving them an infinite pile of garbage is a waste of my resources. I'll leave that to people like you 🙂

            ps. Please learn the difference between scrapping and scraping, and dairy and diary.

            soblow@eldritch.cafeS 1 Reply Last reply
            0
            • booklordofthedings@social.lybry.netB booklordofthedings@social.lybry.net

              @Soblow Linkding sound really interesting.
              I was looking for a bookmark sync thing/firefox sync alternative anyways and wasnt aware of it before.

              soblow@eldritch.cafeS This user is from outside of this forum
              soblow@eldritch.cafeS This user is from outside of this forum
              soblow@eldritch.cafe
              wrote on last edited by
              #9

              @booklordofthedings Oh, for bookmark sync, I wouldn't recommend Linkding, but maybe something like Floccus (and maybe Nextcloud bookmarks)

              1 Reply Last reply
              0
              • khleedril@cyberplace.socialK khleedril@cyberplace.social

                @Soblow Owch. Painful long read, regurgitating the experience of a lot of tech administrators. I like the idea of poisoning AI, but giving them an infinite pile of garbage is a waste of my resources. I'll leave that to people like you 🙂

                ps. Please learn the difference between scrapping and scraping, and dairy and diary.

                soblow@eldritch.cafeS This user is from outside of this forum
                soblow@eldritch.cafeS This user is from outside of this forum
                soblow@eldritch.cafe
                wrote on last edited by
                #10

                @khleedril Oops, thanks for your feedback!
                (english isn't my native language )

                1 Reply Last reply
                0
                • algernon@come-from.mad-scientist.clubA algernon@come-from.mad-scientist.club

                  @Soblow Thank you for writing this, reading it has been very educational!

                  I have a number of things in development that will make iocaine less of a pain, and more suitable for cases like yours - but... that takes a bit of time.

                  And NSoE's documentation is a mess, indeed. My excuse is that it was never meant to be used by anyone else, it's something I wrote for me. But there was no other option for a long time, and even if iocaine3 has a built-in script now, that's not as good as NSoE (yet).

                  I have plans to address that shortcoming, so there's an option that isn't NSoE, has useful, navigatable documentation that isn't written like a mad scientist's diary1.

                  But all the issues you listed are valid, and you even highlighted shortcomings I wasn't aware of, and tricks I did not consider. Now I have more things to play with!


                  1. That was highly amusing to read, and I chuckled. Thanks!2 ↩︎

                  2. Yes, I know, it's not a praise, but... algernon looks at the domain he's tooting from... yeah. ↩︎

                  soblow@eldritch.cafeS This user is from outside of this forum
                  soblow@eldritch.cafeS This user is from outside of this forum
                  soblow@eldritch.cafe
                  wrote on last edited by
                  #11

                  @algernon Thanks for your reply!

                  It may not be obvious, but I don't consider "mad scientist" as a depreciative term so yeah (and to be honest, I didn't even realize it was in your domain name )

                  The main struggles I had were because both iocaine and NSoE use languages I'm not used to, which isn't your fault.

                  Again, your software work nicely and I'm glad they exist and work!
                  (And I know the problem of "this was meant to be a solution for me that I made available to everyone", again it's fully understandable and, that's open-source software, it comes with no expectation nor warranty ).

                  1 Reply Last reply
                  0
                  • soblow@eldritch.cafeS soblow@eldritch.cafe

                    If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                    To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                    Recently, they started flooding my VPS so much it started choking.
                    If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                    This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                    Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                    Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                    #selfHosting #iocaine #indieWeb

                    flann@boosts.squirrel.picturesF This user is from outside of this forum
                    flann@boosts.squirrel.picturesF This user is from outside of this forum
                    flann@boosts.squirrel.pictures
                    wrote on last edited by
                    #12

                    @Soblow Good post!

                    A lot more comprehensive than my own, I went more with just quickly highlighting a few special cases rather than laying out the entire setup

                    If you don't mind I'll link to your post alongside the one from lux that was already there.

                    soblow@eldritch.cafeS 1 Reply Last reply
                    0
                    • flann@boosts.squirrel.picturesF flann@boosts.squirrel.pictures

                      @Soblow Good post!

                      A lot more comprehensive than my own, I went more with just quickly highlighting a few special cases rather than laying out the entire setup

                      If you don't mind I'll link to your post alongside the one from lux that was already there.

                      soblow@eldritch.cafeS This user is from outside of this forum
                      soblow@eldritch.cafeS This user is from outside of this forum
                      soblow@eldritch.cafe
                      wrote on last edited by
                      #13

                      @flann Please do
                      I'll go read that post when I have time~

                      soblow@eldritch.cafeS 1 Reply Last reply
                      0
                      • soblow@eldritch.cafeS soblow@eldritch.cafe

                        If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                        To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                        Recently, they started flooding my VPS so much it started choking.
                        If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                        This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                        Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                        Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                        #selfHosting #iocaine #indieWeb

                        soblow@eldritch.cafeS This user is from outside of this forum
                        soblow@eldritch.cafeS This user is from outside of this forum
                        soblow@eldritch.cafe
                        wrote on last edited by
                        #14

                        As promised, I updated the linked repository to add a README (and a license)

                        1 Reply Last reply
                        0
                        • soblow@eldritch.cafeS soblow@eldritch.cafe

                          @flann Please do
                          I'll go read that post when I have time~

                          soblow@eldritch.cafeS This user is from outside of this forum
                          soblow@eldritch.cafeS This user is from outside of this forum
                          soblow@eldritch.cafe
                          wrote on last edited by
                          #15

                          @flann Okay, I read it, it's highly interesting and it would've helped if I saw this earlier

                          There are things from your blogpost I'll likely try later too

                          1 Reply Last reply
                          0
                          • soblow@eldritch.cafeS soblow@eldritch.cafe

                            If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                            To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                            Recently, they started flooding my VPS so much it started choking.
                            If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                            This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                            Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                            Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                            #selfHosting #iocaine #indieWeb

                            pandro@fedi.imowl.netP This user is from outside of this forum
                            pandro@fedi.imowl.netP This user is from outside of this forum
                            pandro@fedi.imowl.net
                            wrote on last edited by
                            #16

                            @Soblow@eldritch.cafe
                            A nice read! Nothing too new for me as I was following you live on that journey but good to hear you found something that helps!

                            Also using iocaine for my services I at one point thought "why not let iocaine also serve garbage to empty user-agents?". That'd also catch a lot of the vuln scanners that are convinced I'm using wordpress (I'm not).

                            There's a surprising number of legitimate traffic you wouldn't expect to not set a UA
                            ​​

                            soblow@eldritch.cafeS 1 Reply Last reply
                            0
                            • pandro@fedi.imowl.netP pandro@fedi.imowl.net

                              @Soblow@eldritch.cafe
                              A nice read! Nothing too new for me as I was following you live on that journey but good to hear you found something that helps!

                              Also using iocaine for my services I at one point thought "why not let iocaine also serve garbage to empty user-agents?". That'd also catch a lot of the vuln scanners that are convinced I'm using wordpress (I'm not).

                              There's a surprising number of legitimate traffic you wouldn't expect to not set a UA
                              ​​

                              soblow@eldritch.cafeS This user is from outside of this forum
                              soblow@eldritch.cafeS This user is from outside of this forum
                              soblow@eldritch.cafe
                              wrote on last edited by
                              #17

                              @pandro That could be something, but for example the "fursona lookup" tool didn't have a User Agent set (until I told the author)...

                              1 Reply Last reply
                              0
                              • soblow@eldritch.cafeS soblow@eldritch.cafe

                                If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                                To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                                Recently, they started flooding my VPS so much it started choking.
                                If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                                This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                                Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                                Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                                #selfHosting #iocaine #indieWeb

                                seldenotter@blimps.xyzS This user is from outside of this forum
                                seldenotter@blimps.xyzS This user is from outside of this forum
                                seldenotter@blimps.xyz
                                wrote on last edited by
                                #18

                                @Soblow "I spent the last few years building up a tolerance to iocaine powder."

                                1 Reply Last reply
                                0
                                • soblow@eldritch.cafeS soblow@eldritch.cafe

                                  If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                                  To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                                  Recently, they started flooding my VPS so much it started choking.
                                  If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                                  This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                                  Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                                  Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                                  #selfHosting #iocaine #indieWeb

                                  icewolf@masto.brightfur.netI This user is from outside of this forum
                                  icewolf@masto.brightfur.netI This user is from outside of this forum
                                  icewolf@masto.brightfur.net
                                  wrote on last edited by
                                  #19

                                  @Soblow Hah... I think we're getting iocaine or something when trying to read the article on our phone (iOS Safari, on iOS 15 or something like that). Haven't tried our desktop. Pretty meta, though.

                                  1 Reply Last reply
                                  0
                                  • soblow@eldritch.cafeS soblow@eldritch.cafe

                                    If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                                    To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                                    Recently, they started flooding my VPS so much it started choking.
                                    If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                                    This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                                    Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                                    Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                                    #selfHosting #iocaine #indieWeb

                                    bandie@yip.gayB This user is from outside of this forum
                                    bandie@yip.gayB This user is from outside of this forum
                                    bandie@yip.gay
                                    wrote on last edited by
                                    #20

                                    @Soblow Hmmm.. I think of serving a knowledge.tld just with static garbage now...

                                    1 Reply Last reply
                                    0
                                    • soblow@eldritch.cafeS soblow@eldritch.cafe

                                      If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                                      To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                                      Recently, they started flooding my VPS so much it started choking.
                                      If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                                      This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                                      Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                                      Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                                      #selfHosting #iocaine #indieWeb

                                      soblow@eldritch.cafeS This user is from outside of this forum
                                      soblow@eldritch.cafeS This user is from outside of this forum
                                      soblow@eldritch.cafe
                                      wrote on last edited by
                                      #21

                                      Well, first documented case (on my end) of a false positive, yay...

                                      iOS 15 + Safari, for some reason...

                                      1 Reply Last reply
                                      0
                                      • soblow@eldritch.cafeS soblow@eldritch.cafe

                                        If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                                        To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                                        Recently, they started flooding my VPS so much it started choking.
                                        If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                                        This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                                        Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                                        Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                                        #selfHosting #iocaine #indieWeb

                                        jernej__s@infosec.exchangeJ This user is from outside of this forum
                                        jernej__s@infosec.exchangeJ This user is from outside of this forum
                                        jernej__s@infosec.exchange
                                        wrote on last edited by
                                        #22

                                        @Soblow Interesting, I've noticed a very similar pattern on a client's web server, which they use for hosting internal projects (which have to be available publicly) – 5000 requests per second every few hours on specific subdomains, from residential IPs, with each IP doing about 20 requests, changing the user-agent once. Nothing knowledge-like though, most of the sites run Wordpress.

                                        1 Reply Last reply
                                        0
                                        • soblow@eldritch.cafeS soblow@eldritch.cafe

                                          If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                                          To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                                          Recently, they started flooding my VPS so much it started choking.
                                          If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                                          This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                                          Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                                          Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                                          #selfHosting #iocaine #indieWeb

                                          S This user is from outside of this forum
                                          S This user is from outside of this forum
                                          sargeros@rivals.space
                                          wrote on last edited by
                                          #23

                                          @Soblow just tried to add your article to my self hosted Readeck (a read it later service) and I think the request got caught by an anti scraper.
                                          Guess I'll have to read the article soon to understand what happened!

                                          Link Preview Image
                                          soblow@eldritch.cafeS 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups