Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Brew (#forgejo), as usual, is being overloaded by the scrapers.

Brew (#forgejo), as usual, is being overloaded by the scrapers.

Scheduled Pinned Locked Moved Uncategorized
forgejobsdcafebsdcafeservices
13 Posts 5 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • pertho@mastodon.bsd.cafeP pertho@mastodon.bsd.cafe

    @stefano Have you tried blocking the IP ranges in Julian's list?

    Julian Oliver (@JulianOliver@mastodon.social)

    I've done the log analysis and the two biggest contributors that brought the AI crawler hits up to 2 million in a day, a 4x increase on a week prior, are ByteSpider (Singapore networks) and especially AppleBot (used for Siri and other Apple products). The parasites.txt is now >4500 lines long: https://scienceispoetry.net/files/parasites.txt

    favicon

    Mastodon (mastodon.social)

    Or is it all coming from residential proxies? 🤔

    stefano@mastodon.bsd.cafeS This user is from outside of this forum
    stefano@mastodon.bsd.cafeS This user is from outside of this forum
    stefano@mastodon.bsd.cafe
    wrote last edited by
    #4

    @pertho residential proxies. I blocked everything I could block, but it wasn't enough.

    1 Reply Last reply
    0
    • pertho@mastodon.bsd.cafeP pertho@mastodon.bsd.cafe

      @stefano Have you tried blocking the IP ranges in Julian's list?

      Julian Oliver (@JulianOliver@mastodon.social)

      I've done the log analysis and the two biggest contributors that brought the AI crawler hits up to 2 million in a day, a 4x increase on a week prior, are ByteSpider (Singapore networks) and especially AppleBot (used for Siri and other Apple products). The parasites.txt is now >4500 lines long: https://scienceispoetry.net/files/parasites.txt

      favicon

      Mastodon (mastodon.social)

      Or is it all coming from residential proxies? 🤔

      oxy@social.bsdlab.auO This user is from outside of this forum
      oxy@social.bsdlab.auO This user is from outside of this forum
      oxy@social.bsdlab.au
      wrote last edited by
      #5
      @pertho @stefano oooh interesting list.

      I've been tinkering with ssh/httpd logs/awk and enriching the data with https://iplocate.io/ and maybe eventually greynoise and spamhaus (to get more residential proxies etc)
      1 Reply Last reply
      0
      • stefano@mastodon.bsd.cafeS stefano@mastodon.bsd.cafe

        EDIT: done, let me know if you experience problems

        Brew (#forgejo), as usual, is being overloaded by the scrapers.

        I think I'll have to put an Anubis in front of it. I don't love those "blocks", but sometimes you need to.

        #BSDCafe #BSDCafeServices

        tubsta@social.bsdlab.auT This user is from outside of this forum
        tubsta@social.bsdlab.auT This user is from outside of this forum
        tubsta@social.bsdlab.au
        wrote last edited by
        #6
        @stefano Stick Bunny in with origin shield to drop the nuffs scraping your site
        stefano@mastodon.bsd.cafeS 1 Reply Last reply
        0
        • tubsta@social.bsdlab.auT tubsta@social.bsdlab.au
          @stefano Stick Bunny in with origin shield to drop the nuffs scraping your site
          stefano@mastodon.bsd.cafeS This user is from outside of this forum
          stefano@mastodon.bsd.cafeS This user is from outside of this forum
          stefano@mastodon.bsd.cafe
          wrote last edited by
          #7

          @tubsta this would work, but I'm trying to avoid using (external) CDNs, at the moment.

          tubsta@social.bsdlab.auT 1 Reply Last reply
          0
          • stefano@mastodon.bsd.cafeS stefano@mastodon.bsd.cafe

            @tubsta this would work, but I'm trying to avoid using (external) CDNs, at the moment.

            tubsta@social.bsdlab.auT This user is from outside of this forum
            tubsta@social.bsdlab.auT This user is from outside of this forum
            tubsta@social.bsdlab.au
            wrote last edited by
            #8
            @stefano I agree with what you are trying to do as I would rather avoid CDNs but some services need it, just gotta work out the least shit ones and the ones that are Europe focused to assist here.
            tubsta@social.bsdlab.auT stefano@mastodon.bsd.cafeS 2 Replies Last reply
            0
            • tubsta@social.bsdlab.auT tubsta@social.bsdlab.au
              @stefano I agree with what you are trying to do as I would rather avoid CDNs but some services need it, just gotta work out the least shit ones and the ones that are Europe focused to assist here.
              tubsta@social.bsdlab.auT This user is from outside of this forum
              tubsta@social.bsdlab.auT This user is from outside of this forum
              tubsta@social.bsdlab.au
              wrote last edited by
              #9
              @stefano FWIW I spend about $5 a month with Bunny’s CDN products for bsdlab
              mwl@io.mwl.ioM 1 Reply Last reply
              0
              • tubsta@social.bsdlab.auT tubsta@social.bsdlab.au
                @stefano FWIW I spend about $5 a month with Bunny’s CDN products for bsdlab
                mwl@io.mwl.ioM This user is from outside of this forum
                mwl@io.mwl.ioM This user is from outside of this forum
                mwl@io.mwl.io
                wrote last edited by
                #10

                @tubsta @stefano

                If you must CDN, Bunny is 100% the way to go. Contains much less suck.

                I run cdn.mwl.io specifically to distribute files. It's a CDN composed of one host.

                1 Reply Last reply
                0
                • tubsta@social.bsdlab.auT tubsta@social.bsdlab.au
                  @stefano I agree with what you are trying to do as I would rather avoid CDNs but some services need it, just gotta work out the least shit ones and the ones that are Europe focused to assist here.
                  stefano@mastodon.bsd.cafeS This user is from outside of this forum
                  stefano@mastodon.bsd.cafeS This user is from outside of this forum
                  stefano@mastodon.bsd.cafe
                  wrote last edited by
                  #11

                  @tubsta sure. Bunny is great. I have an account and use it for some services. For some time, some of the BSD Cafe contents were served by them, and it was perfect.

                  tubsta@social.bsdlab.auT 1 Reply Last reply
                  0
                  • stefano@mastodon.bsd.cafeS stefano@mastodon.bsd.cafe

                    @tubsta sure. Bunny is great. I have an account and use it for some services. For some time, some of the BSD Cafe contents were served by them, and it was perfect.

                    tubsta@social.bsdlab.auT This user is from outside of this forum
                    tubsta@social.bsdlab.auT This user is from outside of this forum
                    tubsta@social.bsdlab.au
                    wrote last edited by
                    #12
                    @stefano They now have S3 object access for their storage nodes which has been a long time coming
                    stefano@mastodon.bsd.cafeS 1 Reply Last reply
                    0
                    • tubsta@social.bsdlab.auT tubsta@social.bsdlab.au
                      @stefano They now have S3 object access for their storage nodes which has been a long time coming
                      stefano@mastodon.bsd.cafeS This user is from outside of this forum
                      stefano@mastodon.bsd.cafeS This user is from outside of this forum
                      stefano@mastodon.bsd.cafe
                      wrote last edited by
                      #13

                      @tubsta oh nice! I was curious to see it implemented. I'll have a look.

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups