Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. There are millions of these single query scrapers.

There are millions of these single query scrapers.

Scheduled Pinned Locked Moved Uncategorized
37 Posts 18 Posters 61 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • bert_hubert@eupolicy.socialB This user is from outside of this forum
    bert_hubert@eupolicy.socialB This user is from outside of this forum
    bert_hubert@eupolicy.social
    wrote last edited by
    #1

    RE: https://social.treehouse.systems/@mgorny/116465292654071299

    There are millions of these single query scrapers. As open projects we should band together and block them preemptively. This is the only way we can put a stop to this.

    bert_hubert@eupolicy.socialB npettiaux@mamot.frN jhaas@a2mi.socialJ datum@zeroes.caD 4 Replies Last reply
    3
    0
    • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

      RE: https://social.treehouse.systems/@mgorny/116465292654071299

      There are millions of these single query scrapers. As open projects we should band together and block them preemptively. This is the only way we can put a stop to this.

      bert_hubert@eupolicy.socialB This user is from outside of this forum
      bert_hubert@eupolicy.socialB This user is from outside of this forum
      bert_hubert@eupolicy.social
      wrote last edited by
      #2

      Here are 91,372 IP addresses pretending to be normal browsers, while sending a *single* query to me today. http://berthub.eu/tmp/singlequery-2026-04-26.csv - feel free to block them all!

      koen@procolix.socialK brown@infosec.exchangeB hashbang@infosec.exchangeH hrbrmstr@mastodon.socialH bert_hubert@eupolicy.socialB 5 Replies Last reply
      0
      • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

        Here are 91,372 IP addresses pretending to be normal browsers, while sending a *single* query to me today. http://berthub.eu/tmp/singlequery-2026-04-26.csv - feel free to block them all!

        koen@procolix.socialK This user is from outside of this forum
        koen@procolix.socialK This user is from outside of this forum
        koen@procolix.social
        wrote last edited by
        #3

        @bert_hubert how did you generate that list? Perhaps we can do the same on ftp.nluug.nl an on mastodon.nl and so on?

        bert_hubert@eupolicy.socialB 1 Reply Last reply
        0
        • koen@procolix.socialK koen@procolix.social

          @bert_hubert how did you generate that list? Perhaps we can do the same on ftp.nluug.nl an on mastodon.nl and so on?

          bert_hubert@eupolicy.socialB This user is from outside of this forum
          bert_hubert@eupolicy.socialB This user is from outside of this forum
          bert_hubert@eupolicy.social
          wrote last edited by
          #4

          @koen This is based on a modified version of https://github.com/berthubert/audience-minutes/ - it needs a bit of work to be better suitable for this purpose.

          koen@procolix.socialK 1 Reply Last reply
          0
          • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

            Here are 91,372 IP addresses pretending to be normal browsers, while sending a *single* query to me today. http://berthub.eu/tmp/singlequery-2026-04-26.csv - feel free to block them all!

            brown@infosec.exchangeB This user is from outside of this forum
            brown@infosec.exchangeB This user is from outside of this forum
            brown@infosec.exchange
            wrote last edited by
            #5

            @bert_hubert residential proxies are a problem that cannot be easily solved, unless you block the entire Internet.

            bert_hubert@eupolicy.socialB 1 Reply Last reply
            0
            • brown@infosec.exchangeB brown@infosec.exchange

              @bert_hubert residential proxies are a problem that cannot be easily solved, unless you block the entire Internet.

              bert_hubert@eupolicy.socialB This user is from outside of this forum
              bert_hubert@eupolicy.socialB This user is from outside of this forum
              bert_hubert@eupolicy.social
              wrote last edited by
              #6

              @brown I’m fine with blocking residential proxy enablers.

              tinmouth@infosec.exchangeT fazalmajid@social.vivaldi.netF kerfuffle@mastodon.onlineK 3 Replies Last reply
              0
              • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

                @brown I’m fine with blocking residential proxy enablers.

                tinmouth@infosec.exchangeT This user is from outside of this forum
                tinmouth@infosec.exchangeT This user is from outside of this forum
                tinmouth@infosec.exchange
                wrote last edited by
                #7

                @bert_hubert @brown time for a second Internet I feel. The first one is about to become usless.

                srtcd424@mas.toS 1 Reply Last reply
                0
                • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

                  Here are 91,372 IP addresses pretending to be normal browsers, while sending a *single* query to me today. http://berthub.eu/tmp/singlequery-2026-04-26.csv - feel free to block them all!

                  hashbang@infosec.exchangeH This user is from outside of this forum
                  hashbang@infosec.exchangeH This user is from outside of this forum
                  hashbang@infosec.exchange
                  wrote last edited by
                  #8

                  @bert_hubert Curious: would I be blocked if I left open a tab and restarted my browser that day but didn't use the website otherwise?

                  bert_hubert@eupolicy.socialB 1 Reply Last reply
                  0
                  • hashbang@infosec.exchangeH hashbang@infosec.exchange

                    @bert_hubert Curious: would I be blocked if I left open a tab and restarted my browser that day but didn't use the website otherwise?

                    bert_hubert@eupolicy.socialB This user is from outside of this forum
                    bert_hubert@eupolicy.socialB This user is from outside of this forum
                    bert_hubert@eupolicy.social
                    wrote last edited by
                    #9

                    @hashbang your browser would still send multiple queries, for CSS and JS and images etc.

                    hashbang@infosec.exchangeH mkoek@mastodon.nlM 2 Replies Last reply
                    0
                    • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

                      @brown I’m fine with blocking residential proxy enablers.

                      fazalmajid@social.vivaldi.netF This user is from outside of this forum
                      fazalmajid@social.vivaldi.netF This user is from outside of this forum
                      fazalmajid@social.vivaldi.net
                      wrote last edited by
                      #10

                      @bert_hubert @brown how will you know who they are, since each IP is used only once? Also, since most ISPs use NAT or CGNAT, there would be significant collateral damage.

                      Eventually, any URL that can cause significant CPU expenditur like Git history or search will have to be put behind an authenticated-user-wall. For sites without a real user base, we could imaging something like PrivacyPass, except means to identify human users. The residential proxy providers don't usually have access to the user's browser and its cookies.

                      bert_hubert@eupolicy.socialB 1 Reply Last reply
                      0
                      • fazalmajid@social.vivaldi.netF fazalmajid@social.vivaldi.net

                        @bert_hubert @brown how will you know who they are, since each IP is used only once? Also, since most ISPs use NAT or CGNAT, there would be significant collateral damage.

                        Eventually, any URL that can cause significant CPU expenditur like Git history or search will have to be put behind an authenticated-user-wall. For sites without a real user base, we could imaging something like PrivacyPass, except means to identify human users. The residential proxy providers don't usually have access to the user's browser and its cookies.

                        bert_hubert@eupolicy.socialB This user is from outside of this forum
                        bert_hubert@eupolicy.socialB This user is from outside of this forum
                        bert_hubert@eupolicy.social
                        wrote last edited by
                        #11

                        @fazalmajid @brown I worded it carefully - we need to work together so we can spot "only once" queries across sites. The collateral damage is fine, will make people care about this stuff.

                        fazalmajid@social.vivaldi.netF 1 Reply Last reply
                        0
                        • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

                          @hashbang your browser would still send multiple queries, for CSS and JS and images etc.

                          hashbang@infosec.exchangeH This user is from outside of this forum
                          hashbang@infosec.exchangeH This user is from outside of this forum
                          hashbang@infosec.exchange
                          wrote last edited by
                          #12

                          @bert_hubert True, I was just unsure if that is what you meant. Phew. My tab saving strategy can continue then 😉

                          1 Reply Last reply
                          0
                          • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

                            @fazalmajid @brown I worded it carefully - we need to work together so we can spot "only once" queries across sites. The collateral damage is fine, will make people care about this stuff.

                            fazalmajid@social.vivaldi.netF This user is from outside of this forum
                            fazalmajid@social.vivaldi.netF This user is from outside of this forum
                            fazalmajid@social.vivaldi.net
                            wrote last edited by
                            #13

                            @bert_hubert @brown what, like spam traps deliberately placed in robots.txt where LLM crawlers will be unable to resist? I shudder to think about how much bandwidth would be required to synchronize the lookup tables of the the LLMbot-RBLs, not to mention query traffic.

                            As for the collateral damage, the ordinary user of an ISP has no control over whether their neigbor signs up with a residential proxy for a couple of bucks per month, or even to figure out what happened. As for pressuring ISPs, the media groups have been trying to do that for decades with copyright infringers, to no avail.

                            bert_hubert@eupolicy.socialB srtcd424@mas.toS 2 Replies Last reply
                            0
                            • fazalmajid@social.vivaldi.netF fazalmajid@social.vivaldi.net

                              @bert_hubert @brown what, like spam traps deliberately placed in robots.txt where LLM crawlers will be unable to resist? I shudder to think about how much bandwidth would be required to synchronize the lookup tables of the the LLMbot-RBLs, not to mention query traffic.

                              As for the collateral damage, the ordinary user of an ISP has no control over whether their neigbor signs up with a residential proxy for a couple of bucks per month, or even to figure out what happened. As for pressuring ISPs, the media groups have been trying to do that for decades with copyright infringers, to no avail.

                              bert_hubert@eupolicy.socialB This user is from outside of this forum
                              bert_hubert@eupolicy.socialB This user is from outside of this forum
                              bert_hubert@eupolicy.social
                              wrote last edited by
                              #14

                              @fazalmajid @brown the alternative is no services. The pain is here or there, and I'm fine with the pain being somewhere else.

                              fazalmajid@social.vivaldi.netF 1 Reply Last reply
                              0
                              • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

                                @fazalmajid @brown the alternative is no services. The pain is here or there, and I'm fine with the pain being somewhere else.

                                fazalmajid@social.vivaldi.netF This user is from outside of this forum
                                fazalmajid@social.vivaldi.netF This user is from outside of this forum
                                fazalmajid@social.vivaldi.net
                                wrote last edited by
                                #15

                                @bert_hubert @brown what I am saying is an allowlist with some robust yet privacy-preserving authentication is more likely to succeed than a whack-a-mole blocklist.

                                1 Reply Last reply
                                0
                                • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

                                  @brown I’m fine with blocking residential proxy enablers.

                                  kerfuffle@mastodon.onlineK This user is from outside of this forum
                                  kerfuffle@mastodon.onlineK This user is from outside of this forum
                                  kerfuffle@mastodon.online
                                  wrote last edited by
                                  #16

                                  @bert_hubert @brown Have you considered putting something like Anubis in front of your site, to make it too costly for AI-bots to visit your site? https://anubis.techaro.lol

                                  bert_hubert@eupolicy.socialB 1 Reply Last reply
                                  0
                                  • kerfuffle@mastodon.onlineK kerfuffle@mastodon.online

                                    @bert_hubert @brown Have you considered putting something like Anubis in front of your site, to make it too costly for AI-bots to visit your site? https://anubis.techaro.lol

                                    bert_hubert@eupolicy.socialB This user is from outside of this forum
                                    bert_hubert@eupolicy.socialB This user is from outside of this forum
                                    bert_hubert@eupolicy.social
                                    wrote last edited by
                                    #17

                                    @kerfuffle @brown I personally do not suffer from this problem since my site can deal with the traffic. The problem is more for projects that burn more CPU. I'm told Anubis is a solved problem for these scrapers - they have sufficient CPU power...

                                    sfan5@mastodon.onlineS 1 Reply Last reply
                                    0
                                    • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

                                      @kerfuffle @brown I personally do not suffer from this problem since my site can deal with the traffic. The problem is more for projects that burn more CPU. I'm told Anubis is a solved problem for these scrapers - they have sufficient CPU power...

                                      sfan5@mastodon.onlineS This user is from outside of this forum
                                      sfan5@mastodon.onlineS This user is from outside of this forum
                                      sfan5@mastodon.online
                                      wrote last edited by
                                      #18

                                      @bert_hubert @kerfuffle @brown As far as I'm aware Anubis still works just fine. However the reason it does is not that the needed CPU power would be too expensive for the scrapers, it's just that they haven't bothered to bypass it yet.

                                      1 Reply Last reply
                                      0
                                      • bert_hubert@eupolicy.socialB bert_hubert@eupolicy.social

                                        Here are 91,372 IP addresses pretending to be normal browsers, while sending a *single* query to me today. http://berthub.eu/tmp/singlequery-2026-04-26.csv - feel free to block them all!

                                        hrbrmstr@mastodon.socialH This user is from outside of this forum
                                        hrbrmstr@mastodon.socialH This user is from outside of this forum
                                        hrbrmstr@mastodon.social
                                        wrote last edited by
                                        #19

                                        @bert_hubert oof. tons of residential proxies but also infected routers and IoT/ICS garbage. Mix of AI scrapers using res proxies and mirai-adjacent. Worst of both worlds.

                                        Link Preview ImageLink Preview Image
                                        computerywar@infosec.exchangeC 1 Reply Last reply
                                        0
                                        • hrbrmstr@mastodon.socialH hrbrmstr@mastodon.social

                                          @bert_hubert oof. tons of residential proxies but also infected routers and IoT/ICS garbage. Mix of AI scrapers using res proxies and mirai-adjacent. Worst of both worlds.

                                          Link Preview ImageLink Preview Image
                                          computerywar@infosec.exchangeC This user is from outside of this forum
                                          computerywar@infosec.exchangeC This user is from outside of this forum
                                          computerywar@infosec.exchange
                                          wrote last edited by
                                          #20

                                          @hrbrmstr @bert_hubert huh, a little surprised spacex isnt higher on that list.

                                          hrbrmstr@mastodon.socialH 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups