Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

Scheduled Pinned Locked Moved Uncategorized
selfhostingiocaineindieweb
57 Posts 27 Posters 2 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • soblow@eldritch.cafeS soblow@eldritch.cafe

    If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

    To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

    Recently, they started flooding my VPS so much it started choking.
    If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

    This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

    Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

    Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

    #selfHosting #iocaine #indieWeb

    jernej__s@infosec.exchangeJ This user is from outside of this forum
    jernej__s@infosec.exchangeJ This user is from outside of this forum
    jernej__s@infosec.exchange
    wrote on last edited by
    #22

    @Soblow Interesting, I've noticed a very similar pattern on a client's web server, which they use for hosting internal projects (which have to be available publicly) – 5000 requests per second every few hours on specific subdomains, from residential IPs, with each IP doing about 20 requests, changing the user-agent once. Nothing knowledge-like though, most of the sites run Wordpress.

    1 Reply Last reply
    0
    • soblow@eldritch.cafeS soblow@eldritch.cafe

      If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

      To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

      Recently, they started flooding my VPS so much it started choking.
      If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

      This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

      Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

      Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

      #selfHosting #iocaine #indieWeb

      S This user is from outside of this forum
      S This user is from outside of this forum
      sargeros@rivals.space
      wrote on last edited by
      #23

      @Soblow just tried to add your article to my self hosted Readeck (a read it later service) and I think the request got caught by an anti scraper.
      Guess I'll have to read the article soon to understand what happened!

      Link Preview Image
      soblow@eldritch.cafeS 1 Reply Last reply
      0
      • S sargeros@rivals.space

        @Soblow just tried to add your article to my self hosted Readeck (a read it later service) and I think the request got caught by an anti scraper.
        Guess I'll have to read the article soon to understand what happened!

        Link Preview Image
        soblow@eldritch.cafeS This user is from outside of this forum
        soblow@eldritch.cafeS This user is from outside of this forum
        soblow@eldritch.cafe
        wrote on last edited by
        #24

        @Sargeros Yes, it's something expected, I have the same issue with some other tools.
        Maybe it has a User-Agent I can whitelist though?

        S 1 Reply Last reply
        0
        • soblow@eldritch.cafeS soblow@eldritch.cafe

          @Sargeros Yes, it's something expected, I have the same issue with some other tools.
          Maybe it has a User-Agent I can whitelist though?

          S This user is from outside of this forum
          S This user is from outside of this forum
          sargeros@rivals.space
          wrote on last edited by
          #25

          @Soblow It seems like it announces itself as a browser https://codeberg.org/readeck/readeck/src/commit/5a979acdb2afcfe8f87d5385db778e3643322b04/internal/httpclient/client.go#L33

          But don't worry, I used the browser extension to save your article and it worked fine

          Link Preview Image
          soblow@eldritch.cafeS 1 Reply Last reply
          0
          • S sargeros@rivals.space

            @Soblow It seems like it announces itself as a browser https://codeberg.org/readeck/readeck/src/commit/5a979acdb2afcfe8f87d5385db778e3643322b04/internal/httpclient/client.go#L33

            But don't worry, I used the browser extension to save your article and it worked fine

            Link Preview Image
            soblow@eldritch.cafeS This user is from outside of this forum
            soblow@eldritch.cafeS This user is from outside of this forum
            soblow@eldritch.cafe
            wrote on last edited by
            #26

            @Sargeros pretty sure that UA got caught in the maze indeed...
            good to know!

            1 Reply Last reply
            0
            • soblow@eldritch.cafeS soblow@eldritch.cafe

              If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

              To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

              Recently, they started flooding my VPS so much it started choking.
              If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

              This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

              Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

              Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

              #selfHosting #iocaine #indieWeb

              velvet@pawb.funV This user is from outside of this forum
              velvet@pawb.funV This user is from outside of this forum
              velvet@pawb.fun
              wrote on last edited by
              #27

              @Soblow @NafiTheBear @terrencefoxfur

              terrencefoxfur@furryfandom.meT 1 Reply Last reply
              0
              • velvet@pawb.funV velvet@pawb.fun

                @Soblow @NafiTheBear @terrencefoxfur

                terrencefoxfur@furryfandom.meT This user is from outside of this forum
                terrencefoxfur@furryfandom.meT This user is from outside of this forum
                terrencefoxfur@furryfandom.me
                wrote on last edited by
                #28

                @velvet @Soblow @NafiTheBear Yup they are the bane of my life. I have an ... aggressive ... filter list, and have absolutely no qualms in just blocking LLM's entirely 🙂

                1 Reply Last reply
                0
                • soblow@eldritch.cafeS soblow@eldritch.cafe

                  If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                  To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                  Recently, they started flooding my VPS so much it started choking.
                  If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                  This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                  Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                  Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                  #selfHosting #iocaine #indieWeb

                  soblow@eldritch.cafeS This user is from outside of this forum
                  soblow@eldritch.cafeS This user is from outside of this forum
                  soblow@eldritch.cafe
                  wrote on last edited by
                  #29

                  I should do an addendum but right now, my main website is getting hammered at rates similar to what my knowledge website used to be hit at.
                  On the opposite, the "knowledge" website is back at "normal" background noise of <100req/min.

                  The banlist now contains so many IPs, and yet they still reach 6kreq/min nearly constantly.

                  At that point, I'm thinking about tinkering my banip tool to compute optimal subnets instead of always crafting /24 subnets.

                  puffin@squawk.socialP lilacperegrine@clockwork.monsterL soblow@eldritch.cafeS _syhmac@meow.social_ 4 Replies Last reply
                  0
                  • soblow@eldritch.cafeS soblow@eldritch.cafe

                    I should do an addendum but right now, my main website is getting hammered at rates similar to what my knowledge website used to be hit at.
                    On the opposite, the "knowledge" website is back at "normal" background noise of <100req/min.

                    The banlist now contains so many IPs, and yet they still reach 6kreq/min nearly constantly.

                    At that point, I'm thinking about tinkering my banip tool to compute optimal subnets instead of always crafting /24 subnets.

                    puffin@squawk.socialP This user is from outside of this forum
                    puffin@squawk.socialP This user is from outside of this forum
                    puffin@squawk.social
                    wrote on last edited by
                    #30

                    @Soblow feel like once you have blocked the big data center it just become wack and mole since tools like those exist 😞

                    Link Preview Image
                    Internet Sharing SDKs: a Closer Look at the Emerging App Monetization Method - Proxyway

                    Internet sharing SDKs are reshaping app monetization. But what do you need to know before adopting them?

                    favicon

                    Proxyway (proxyway.com)

                    1 Reply Last reply
                    0
                    • soblow@eldritch.cafeS soblow@eldritch.cafe

                      I should do an addendum but right now, my main website is getting hammered at rates similar to what my knowledge website used to be hit at.
                      On the opposite, the "knowledge" website is back at "normal" background noise of <100req/min.

                      The banlist now contains so many IPs, and yet they still reach 6kreq/min nearly constantly.

                      At that point, I'm thinking about tinkering my banip tool to compute optimal subnets instead of always crafting /24 subnets.

                      lilacperegrine@clockwork.monsterL This user is from outside of this forum
                      lilacperegrine@clockwork.monsterL This user is from outside of this forum
                      lilacperegrine@clockwork.monster
                      wrote on last edited by
                      #31

                      @Soblow
                      curious, is the subnet thing using similarities in the ip to ban specific ranges?

                      computing optimal things sounds like a math problem and if so, im game to try it out

                      soblow@eldritch.cafeS 1 Reply Last reply
                      0
                      • soblow@eldritch.cafeS soblow@eldritch.cafe

                        I should do an addendum but right now, my main website is getting hammered at rates similar to what my knowledge website used to be hit at.
                        On the opposite, the "knowledge" website is back at "normal" background noise of <100req/min.

                        The banlist now contains so many IPs, and yet they still reach 6kreq/min nearly constantly.

                        At that point, I'm thinking about tinkering my banip tool to compute optimal subnets instead of always crafting /24 subnets.

                        soblow@eldritch.cafeS This user is from outside of this forum
                        soblow@eldritch.cafeS This user is from outside of this forum
                        soblow@eldritch.cafe
                        wrote on last edited by
                        #32

                        That's frightening.
                        Like, legitimately, I'm scared of what that means.

                        I'll try to ratelimit at 10req/min per ip.

                        ifixcoinops@retro.socialI 1 Reply Last reply
                        0
                        • lilacperegrine@clockwork.monsterL lilacperegrine@clockwork.monster

                          @Soblow
                          curious, is the subnet thing using similarities in the ip to ban specific ranges?

                          computing optimal things sounds like a math problem and if so, im game to try it out

                          soblow@eldritch.cafeS This user is from outside of this forum
                          soblow@eldritch.cafeS This user is from outside of this forum
                          soblow@eldritch.cafe
                          wrote on last edited by
                          #33

                          @lilacperegrine Are you familiar with the concept of a quadtree?

                          lilacperegrine@clockwork.monsterL 1 Reply Last reply
                          0
                          • soblow@eldritch.cafeS soblow@eldritch.cafe

                            @lilacperegrine Are you familiar with the concept of a quadtree?

                            lilacperegrine@clockwork.monsterL This user is from outside of this forum
                            lilacperegrine@clockwork.monsterL This user is from outside of this forum
                            lilacperegrine@clockwork.monster
                            wrote on last edited by
                            #34

                            @Soblow
                            a data structure where every node has 4 children?
                            im sorta familiar with it, but i havent used it much

                            soblow@eldritch.cafeS 1 Reply Last reply
                            0
                            • soblow@eldritch.cafeS soblow@eldritch.cafe

                              That's frightening.
                              Like, legitimately, I'm scared of what that means.

                              I'll try to ratelimit at 10req/min per ip.

                              ifixcoinops@retro.socialI This user is from outside of this forum
                              ifixcoinops@retro.socialI This user is from outside of this forum
                              ifixcoinops@retro.social
                              wrote on last edited by
                              #35

                              @Soblow I've been following this with interest

                              1 Reply Last reply
                              0
                              • lilacperegrine@clockwork.monsterL lilacperegrine@clockwork.monster

                                @Soblow
                                a data structure where every node has 4 children?
                                im sorta familiar with it, but i havent used it much

                                soblow@eldritch.cafeS This user is from outside of this forum
                                soblow@eldritch.cafeS This user is from outside of this forum
                                soblow@eldritch.cafe
                                wrote on last edited by
                                #36

                                @lilacperegrine Well, I have an intuition that the problem I'm looking to solve is akin to the construction of a quadtree:
                                I have a list of IPv4/32. I want to generalize this list using the properties of subnets (a n subnet contains 2 n+1 subnets).
                                I don't have an explaination though, and my math skills are rusty...

                                eragon@pl.eragon.reE lilacperegrine@clockwork.monsterL 2 Replies Last reply
                                0
                                • soblow@eldritch.cafeS soblow@eldritch.cafe

                                  @lilacperegrine Well, I have an intuition that the problem I'm looking to solve is akin to the construction of a quadtree:
                                  I have a list of IPv4/32. I want to generalize this list using the properties of subnets (a n subnet contains 2 n+1 subnets).
                                  I don't have an explaination though, and my math skills are rusty...

                                  eragon@pl.eragon.reE This user is from outside of this forum
                                  eragon@pl.eragon.reE This user is from outside of this forum
                                  eragon@pl.eragon.re
                                  wrote on last edited by
                                  #37

                                  @Soblow@eldritch.cafe @lilacperegrine@clockwork.monster Seems like a cool thing to do… if only I could use university's time to do that.

                                  1 Reply Last reply
                                  0
                                  • soblow@eldritch.cafeS soblow@eldritch.cafe

                                    @lilacperegrine Well, I have an intuition that the problem I'm looking to solve is akin to the construction of a quadtree:
                                    I have a list of IPv4/32. I want to generalize this list using the properties of subnets (a n subnet contains 2 n+1 subnets).
                                    I don't have an explaination though, and my math skills are rusty...

                                    lilacperegrine@clockwork.monsterL This user is from outside of this forum
                                    lilacperegrine@clockwork.monsterL This user is from outside of this forum
                                    lilacperegrine@clockwork.monster
                                    wrote on last edited by
                                    #38

                                    @Soblow
                                    so you have a list of ips, and are using a binary(or quad) tree in order to classify them into clean vs dirty?

                                    soblow@eldritch.cafeS 1 Reply Last reply
                                    0
                                    • lilacperegrine@clockwork.monsterL lilacperegrine@clockwork.monster

                                      @Soblow
                                      so you have a list of ips, and are using a binary(or quad) tree in order to classify them into clean vs dirty?

                                      soblow@eldritch.cafeS This user is from outside of this forum
                                      soblow@eldritch.cafeS This user is from outside of this forum
                                      soblow@eldritch.cafe
                                      wrote on last edited by
                                      #39

                                      @lilacperegrine No, not really...
                                      I talked about quadtree because that's something I manipulated and it made met think of it.
                                      Let's take a step back and formalize the problem:

                                      Let's assume I have a list of unique IPv4 adresses.
                                      They are represented on 32 bits.
                                      I want to construct a list of subnets (so still 32 bits) that summarize the list of IPs I have.

                                      For example, if I have 192.168.1.0 and 192.168.1.1, I could generalize this with the 192.168.1.0/31 subnet (if I'm not mistaken), which contains the previous two IPs without containing any other IPs.
                                      If it helps, represent them in binary and find the common upper bits:

                                      11000000.10101000.00000001.00000000 (192.168.1.0)
                                      11000000.10101000.00000001.00000001 (192.168.1.1)

                                      The common part is everything up to the last bit, thus the mask is a `/31` which is 255.255.255.254, or in binary `11111111.11111111.11111111.11111110`.

                                      Now, I have tens of thousands of IPs and I'd like the smallest list of subnets that includes all bad IPs without including good IPs

                                      I'm sure there are academic papers about this, this sounds like a problem folks must already have had

                                      1 Reply Last reply
                                      0
                                      • soblow@eldritch.cafeS soblow@eldritch.cafe

                                        I should do an addendum but right now, my main website is getting hammered at rates similar to what my knowledge website used to be hit at.
                                        On the opposite, the "knowledge" website is back at "normal" background noise of <100req/min.

                                        The banlist now contains so many IPs, and yet they still reach 6kreq/min nearly constantly.

                                        At that point, I'm thinking about tinkering my banip tool to compute optimal subnets instead of always crafting /24 subnets.

                                        _syhmac@meow.social_ This user is from outside of this forum
                                        _syhmac@meow.social_ This user is from outside of this forum
                                        _syhmac@meow.social
                                        wrote on last edited by
                                        #40

                                        @Soblow Yeah… last time my website had been bombed like this, I decided that Cloudflare is not such a bad idea… I hope you’ll figure it out.

                                        1 Reply Last reply
                                        0
                                        • soblow@eldritch.cafeS soblow@eldritch.cafe

                                          If you self-host services on the internet, you may have seen waves of crawlers hammering your websites without mercy.

                                          To annoy them and protect my services from DDoS, I decided to setup an iocaine instance, along with NSoE... And it worked... Too well.

                                          Recently, they started flooding my VPS so much it started choking.
                                          If you followed me here on Fedi, you saw my journey to find a way to relieve my server.

                                          This is a rant about LLM crawlers, and some observations & conclusions, along with some techniques to help you protect your own services.

                                          Read it here: https://xaselgio.net/posts/26.poisoning-knowledge/

                                          Edit: A follow-up is now available here: https://xaselgio.net/posts/26-1.addendum-poisoning-knowledge

                                          #selfHosting #iocaine #indieWeb

                                          stupidcamille@eldritch.cafeS This user is from outside of this forum
                                          stupidcamille@eldritch.cafeS This user is from outside of this forum
                                          stupidcamille@eldritch.cafe
                                          wrote last edited by
                                          #41

                                          @Soblow I don't always read a blog post til the end. But when I do, it's always a banger

                                          soblow@eldritch.cafeS 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups