Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

Scheduled Pinned Locked Moved Uncategorized
selfhostingiocaineindieweb
57 Posts 27 Posters 2 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • flann@boosts.squirrel.picturesF flann@boosts.squirrel.pictures

    @Soblow Good post!

    A lot more comprehensive than my own, I went more with just quickly highlighting a few special cases rather than laying out the entire setup

    If you don't mind I'll link to your post alongside the one from lux that was already there.

    soblow@eldritch.cafeS This user is from outside of this forum
    soblow@eldritch.cafeS This user is from outside of this forum
    soblow@eldritch.cafe
    wrote on last edited by
    #13

    @flann Please do
    I'll go read that post when I have time~

    soblow@eldritch.cafeS 1 Reply Last reply
    0
    • soblow@eldritch.cafeS soblow@eldritch.cafe

      If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

      To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

      Recently, they started flooding my VPS so much it started choking.
      If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

      This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

      Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

      Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

      #selfHosting #iocaine #indieWeb

      soblow@eldritch.cafeS This user is from outside of this forum
      soblow@eldritch.cafeS This user is from outside of this forum
      soblow@eldritch.cafe
      wrote on last edited by
      #14

      As promised, I updated the linked repository to add a README (and a license)

      1 Reply Last reply
      0
      • soblow@eldritch.cafeS soblow@eldritch.cafe

        @flann Please do
        I'll go read that post when I have time~

        soblow@eldritch.cafeS This user is from outside of this forum
        soblow@eldritch.cafeS This user is from outside of this forum
        soblow@eldritch.cafe
        wrote on last edited by
        #15

        @flann Okay, I read it, it's highly interesting and it would've helped if I saw this earlier

        There are things from your blogpost I'll likely try later too

        1 Reply Last reply
        0
        • soblow@eldritch.cafeS soblow@eldritch.cafe

          If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

          To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

          Recently, they started flooding my VPS so much it started choking.
          If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

          This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

          Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

          Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

          #selfHosting #iocaine #indieWeb

          pandro@fedi.imowl.netP This user is from outside of this forum
          pandro@fedi.imowl.netP This user is from outside of this forum
          pandro@fedi.imowl.net
          wrote on last edited by
          #16

          @Soblow@eldritch.cafe
          A nice read! Nothing too new for me as I was following you live on that journey but good to hear you found something that helps!

          Also using iocaine for my services I at one point thought "why not let iocaine also serve garbage to empty user-agents?". That'd also catch a lot of the vuln scanners that are convinced I'm using wordpress (I'm not).

          There's a surprising number of legitimate traffic you wouldn't expect to not set a UA
          ​​

          soblow@eldritch.cafeS 1 Reply Last reply
          0
          • pandro@fedi.imowl.netP pandro@fedi.imowl.net

            @Soblow@eldritch.cafe
            A nice read! Nothing too new for me as I was following you live on that journey but good to hear you found something that helps!

            Also using iocaine for my services I at one point thought "why not let iocaine also serve garbage to empty user-agents?". That'd also catch a lot of the vuln scanners that are convinced I'm using wordpress (I'm not).

            There's a surprising number of legitimate traffic you wouldn't expect to not set a UA
            ​​

            soblow@eldritch.cafeS This user is from outside of this forum
            soblow@eldritch.cafeS This user is from outside of this forum
            soblow@eldritch.cafe
            wrote on last edited by
            #17

            @pandro That could be something, but for example the "fursona lookup" tool didn't have a User Agent set (until I told the author)...

            1 Reply Last reply
            0
            • soblow@eldritch.cafeS soblow@eldritch.cafe

              If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

              To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

              Recently, they started flooding my VPS so much it started choking.
              If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

              This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

              Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

              Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

              #selfHosting #iocaine #indieWeb

              seldenotter@blimps.xyzS This user is from outside of this forum
              seldenotter@blimps.xyzS This user is from outside of this forum
              seldenotter@blimps.xyz
              wrote on last edited by
              #18

              @Soblow "I spent the last few years building up a tolerance to iocaine powder."

              1 Reply Last reply
              0
              • soblow@eldritch.cafeS soblow@eldritch.cafe

                If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                Recently, they started flooding my VPS so much it started choking.
                If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                #selfHosting #iocaine #indieWeb

                icewolf@masto.brightfur.netI This user is from outside of this forum
                icewolf@masto.brightfur.netI This user is from outside of this forum
                icewolf@masto.brightfur.net
                wrote on last edited by
                #19

                @Soblow Hah... I think we're getting iocaine or something when trying to read the article on our phone (iOS Safari, on iOS 15 or something like that). Haven't tried our desktop. Pretty meta, though.

                1 Reply Last reply
                0
                • soblow@eldritch.cafeS soblow@eldritch.cafe

                  If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                  To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                  Recently, they started flooding my VPS so much it started choking.
                  If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                  This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                  Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                  Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                  #selfHosting #iocaine #indieWeb

                  bandie@yip.gayB This user is from outside of this forum
                  bandie@yip.gayB This user is from outside of this forum
                  bandie@yip.gay
                  wrote on last edited by
                  #20

                  @Soblow Hmmm.. I think of serving a knowledge.tld just with static garbage now...

                  1 Reply Last reply
                  0
                  • soblow@eldritch.cafeS soblow@eldritch.cafe

                    If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                    To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                    Recently, they started flooding my VPS so much it started choking.
                    If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                    This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                    Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                    Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                    #selfHosting #iocaine #indieWeb

                    soblow@eldritch.cafeS This user is from outside of this forum
                    soblow@eldritch.cafeS This user is from outside of this forum
                    soblow@eldritch.cafe
                    wrote on last edited by
                    #21

                    Well, first documented case (on my end) of a false positive, yay...

                    iOS 15 + Safari, for some reason...

                    1 Reply Last reply
                    0
                    • soblow@eldritch.cafeS soblow@eldritch.cafe

                      If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                      To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                      Recently, they started flooding my VPS so much it started choking.
                      If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                      This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                      Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                      Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                      #selfHosting #iocaine #indieWeb

                      jernej__s@infosec.exchangeJ This user is from outside of this forum
                      jernej__s@infosec.exchangeJ This user is from outside of this forum
                      jernej__s@infosec.exchange
                      wrote on last edited by
                      #22

                      @Soblow Interesting, I've noticed a very similar pattern on a client's web server, which they use for hosting internal projects (which have to be available publicly) – 5000 requests per second every few hours on specific subdomains, from residential IPs, with each IP doing about 20 requests, changing the user-agent once. Nothing knowledge-like though, most of the sites run Wordpress.

                      1 Reply Last reply
                      0
                      • soblow@eldritch.cafeS soblow@eldritch.cafe

                        If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                        To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                        Recently, they started flooding my VPS so much it started choking.
                        If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                        This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                        Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                        Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                        #selfHosting #iocaine #indieWeb

                        S This user is from outside of this forum
                        S This user is from outside of this forum
                        sargeros@rivals.space
                        wrote on last edited by
                        #23

                        @Soblow just tried to add your article to my self hosted Readeck (a read it later service) and I think the request got caught by an anti scraper.
                        Guess I'll have to read the article soon to understand what happened!

                        Link Preview Image
                        soblow@eldritch.cafeS 1 Reply Last reply
                        0
                        • S sargeros@rivals.space

                          @Soblow just tried to add your article to my self hosted Readeck (a read it later service) and I think the request got caught by an anti scraper.
                          Guess I'll have to read the article soon to understand what happened!

                          Link Preview Image
                          soblow@eldritch.cafeS This user is from outside of this forum
                          soblow@eldritch.cafeS This user is from outside of this forum
                          soblow@eldritch.cafe
                          wrote on last edited by
                          #24

                          @Sargeros Yes, it's something expected, I have the same issue with some other tools.
                          Maybe it has a User-Agent I can whitelist though?

                          S 1 Reply Last reply
                          0
                          • soblow@eldritch.cafeS soblow@eldritch.cafe

                            @Sargeros Yes, it's something expected, I have the same issue with some other tools.
                            Maybe it has a User-Agent I can whitelist though?

                            S This user is from outside of this forum
                            S This user is from outside of this forum
                            sargeros@rivals.space
                            wrote on last edited by
                            #25

                            @Soblow It seems like it announces itself as a browser https://codeberg.org/readeck/readeck/src/commit/5a979acdb2afcfe8f87d5385db778e3643322b04/internal/httpclient/client.go#L33

                            But don't worry, I used the browser extension to save your article and it worked fine

                            Link Preview Image
                            soblow@eldritch.cafeS 1 Reply Last reply
                            0
                            • S sargeros@rivals.space

                              @Soblow It seems like it announces itself as a browser https://codeberg.org/readeck/readeck/src/commit/5a979acdb2afcfe8f87d5385db778e3643322b04/internal/httpclient/client.go#L33

                              But don't worry, I used the browser extension to save your article and it worked fine

                              Link Preview Image
                              soblow@eldritch.cafeS This user is from outside of this forum
                              soblow@eldritch.cafeS This user is from outside of this forum
                              soblow@eldritch.cafe
                              wrote on last edited by
                              #26

                              @Sargeros pretty sure that UA got caught in the maze indeed...
                              good to know!

                              1 Reply Last reply
                              0
                              • soblow@eldritch.cafeS soblow@eldritch.cafe

                                If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                                To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                                Recently, they started flooding my VPS so much it started choking.
                                If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                                This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                                Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                                Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                                #selfHosting #iocaine #indieWeb

                                velvet@pawb.funV This user is from outside of this forum
                                velvet@pawb.funV This user is from outside of this forum
                                velvet@pawb.fun
                                wrote on last edited by
                                #27

                                @Soblow @NafiTheBear @terrencefoxfur

                                terrencefoxfur@furryfandom.meT 1 Reply Last reply
                                0
                                • velvet@pawb.funV velvet@pawb.fun

                                  @Soblow @NafiTheBear @terrencefoxfur

                                  terrencefoxfur@furryfandom.meT This user is from outside of this forum
                                  terrencefoxfur@furryfandom.meT This user is from outside of this forum
                                  terrencefoxfur@furryfandom.me
                                  wrote on last edited by
                                  #28

                                  @velvet @Soblow @NafiTheBear Yup they are the bane of my life. I have an ... aggressive ... filter list, and have absolutely no qualms in just blocking LLM's entirely 🙂

                                  1 Reply Last reply
                                  0
                                  • soblow@eldritch.cafeS soblow@eldritch.cafe

                                    If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                                    To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                                    Recently, they started flooding my VPS so much it started choking.
                                    If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                                    This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                                    Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                                    Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                                    #selfHosting #iocaine #indieWeb

                                    soblow@eldritch.cafeS This user is from outside of this forum
                                    soblow@eldritch.cafeS This user is from outside of this forum
                                    soblow@eldritch.cafe
                                    wrote on last edited by
                                    #29

                                    I should do an addendum but right now, my main website is getting hammered at rates similar to what my knowledge website used to be hit at.
                                    On the opposite, the "knowledge" website is back at "normal" background noise of <100req/min.

                                    The banlist now contains so many IPs, and yet they still reach 6kreq/min nearly constantly.

                                    At that point, I'm thinking about tinkering my banip tool to compute optimal subnets instead of always crafting /24 subnets.

                                    puffin@squawk.socialP lilacperegrine@clockwork.monsterL soblow@eldritch.cafeS _syhmac@meow.social_ 4 Replies Last reply
                                    0
                                    • soblow@eldritch.cafeS soblow@eldritch.cafe

                                      I should do an addendum but right now, my main website is getting hammered at rates similar to what my knowledge website used to be hit at.
                                      On the opposite, the "knowledge" website is back at "normal" background noise of <100req/min.

                                      The banlist now contains so many IPs, and yet they still reach 6kreq/min nearly constantly.

                                      At that point, I'm thinking about tinkering my banip tool to compute optimal subnets instead of always crafting /24 subnets.

                                      puffin@squawk.socialP This user is from outside of this forum
                                      puffin@squawk.socialP This user is from outside of this forum
                                      puffin@squawk.social
                                      wrote on last edited by
                                      #30

                                      @Soblow feel like once you have blocked the big data center it just become wack and mole since tools like those exist 😞

                                      Link Preview Image
                                      Internet Sharing SDKs: a Closer Look at the Emerging App Monetization Method - Proxyway

                                      Internet sharing SDKs are reshaping app monetization. But what do you need to know before adopting them?

                                      favicon

                                      Proxyway (proxyway.com)

                                      1 Reply Last reply
                                      0
                                      • soblow@eldritch.cafeS soblow@eldritch.cafe

                                        I should do an addendum but right now, my main website is getting hammered at rates similar to what my knowledge website used to be hit at.
                                        On the opposite, the "knowledge" website is back at "normal" background noise of <100req/min.

                                        The banlist now contains so many IPs, and yet they still reach 6kreq/min nearly constantly.

                                        At that point, I'm thinking about tinkering my banip tool to compute optimal subnets instead of always crafting /24 subnets.

                                        lilacperegrine@clockwork.monsterL This user is from outside of this forum
                                        lilacperegrine@clockwork.monsterL This user is from outside of this forum
                                        lilacperegrine@clockwork.monster
                                        wrote on last edited by
                                        #31

                                        @Soblow
                                        curious, is the subnet thing using similarities in the ip to ban specific ranges?

                                        computing optimal things sounds like a math problem and if so, im game to try it out

                                        soblow@eldritch.cafeS 1 Reply Last reply
                                        0
                                        • soblow@eldritch.cafeS soblow@eldritch.cafe

                                          I should do an addendum but right now, my main website is getting hammered at rates similar to what my knowledge website used to be hit at.
                                          On the opposite, the "knowledge" website is back at "normal" background noise of <100req/min.

                                          The banlist now contains so many IPs, and yet they still reach 6kreq/min nearly constantly.

                                          At that point, I'm thinking about tinkering my banip tool to compute optimal subnets instead of always crafting /24 subnets.

                                          soblow@eldritch.cafeS This user is from outside of this forum
                                          soblow@eldritch.cafeS This user is from outside of this forum
                                          soblow@eldritch.cafe
                                          wrote on last edited by
                                          #32

                                          That's frightening.
                                          Like, legitimately, I'm scared of what that means.

                                          I'll try to ratelimit at 10req/min per ip.

                                          ifixcoinops@retro.socialI 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups