Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

Scheduled Pinned Locked Moved Uncategorized
openstreetmapbotsabuse
50 Posts 28 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • osm_tech@en.osm.townO osm_tech@en.osm.town

    To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

    If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
    🙏🌍 #AI #Bots #Abuse

    jonsaenzagirre@mastodon.eusJ This user is from outside of this forum
    jonsaenzagirre@mastodon.eusJ This user is from outside of this forum
    jonsaenzagirre@mastodon.eus
    wrote last edited by
    #18

    @osm_tech question. Why do people scrape server which make the data freely available? And, probably, better structured in the final product. I don't see the point.

    osm_tech@en.osm.townO dnub@mastodon.socialD vampirdaddy@chaos.socialV 3 Replies Last reply
    0
    • jonsaenzagirre@mastodon.eusJ jonsaenzagirre@mastodon.eus

      @osm_tech question. Why do people scrape server which make the data freely available? And, probably, better structured in the final product. I don't see the point.

      osm_tech@en.osm.townO This user is from outside of this forum
      osm_tech@en.osm.townO This user is from outside of this forum
      osm_tech@en.osm.town
      wrote last edited by
      #19

      @JonSaenzAgirre It is a good questions, and we don't know the answer either. Our planet data is so much easier to process and use.

      ff7@freiburg.socialF 1 Reply Last reply
      0
      • utf_7@mastodon.socialU utf_7@mastodon.social

        @osm_tech tHeN yOu jUsT neEd tO sCaLe

        osm_tech@en.osm.townO This user is from outside of this forum
        osm_tech@en.osm.townO This user is from outside of this forum
        osm_tech@en.osm.town
        wrote last edited by
        #20

        @utf_7 In this economy with RAM prices what they are?!? 😉

        1 Reply Last reply
        0
        • jonsaenzagirre@mastodon.eusJ jonsaenzagirre@mastodon.eus

          @osm_tech question. Why do people scrape server which make the data freely available? And, probably, better structured in the final product. I don't see the point.

          dnub@mastodon.socialD This user is from outside of this forum
          dnub@mastodon.socialD This user is from outside of this forum
          dnub@mastodon.social
          wrote last edited by
          #21

          @JonSaenzAgirre vibe coders gonna vibe

          1 Reply Last reply
          0
          • hunterz@mastodon.sdf.orgH hunterz@mastodon.sdf.org

            @osm_tech does coming from residential IPs mean that someone has baked a scraper into some popular tool that people don't realize is doing that?

            jay0@alico.nexusJ This user is from outside of this forum
            jay0@alico.nexusJ This user is from outside of this forum
            jay0@alico.nexus
            wrote last edited by
            #22

            @HunterZ@mastodon.sdf.org @osm_tech@en.osm.town lots of mobile/desktop apps, browser extensions, and even IoT devices are paid by "residential proxy" companies to prey on their users by selling said users's connections to AI scrapers https://www.spamhaus.org/resource-hub/compromised/lets-talk-about-the-danger-of-residential-proxy-networks/

            vampirdaddy@chaos.socialV 1 Reply Last reply
            0
            • ryanprior@mastodon.socialR ryanprior@mastodon.social

              @HunterZ @osm_tech this is actually quite common. Mobile advertising SDKs for games, background apps, etc include residential scraping proxy functionality that they can sell to the highest bidder, and then when scrapers want to avoid restrictions they can pay a fraction of a penny to send their requests via your phone. Millions of people use apps with this built in and have no idea. Most websites don't want to ban the residential scrapers because it can hurt growth.

              olbohlen@norden.socialO This user is from outside of this forum
              olbohlen@norden.socialO This user is from outside of this forum
              olbohlen@norden.social
              wrote last edited by
              #23

              @ryanprior @HunterZ @osm_tech I have that scraping also on my private webserver and it forced me to make a whole bunch of content private. yet still the botnet scrapes onto it and gets 404s now. Every single request from a different IP...

              ryanprior@mastodon.socialR 1 Reply Last reply
              0
              • olbohlen@norden.socialO olbohlen@norden.social

                @ryanprior @HunterZ @osm_tech I have that scraping also on my private webserver and it forced me to make a whole bunch of content private. yet still the botnet scrapes onto it and gets 404s now. Every single request from a different IP...

                ryanprior@mastodon.socialR This user is from outside of this forum
                ryanprior@mastodon.socialR This user is from outside of this forum
                ryanprior@mastodon.social
                wrote last edited by
                #24

                @olbohlen @HunterZ @osm_tech sad to hear that! It's wild though, you can sign up for a scraper proxy service in minutes. They're legal, inexpensive, and easy to use. Admins who assume scrapers are using their own machines that inauthentic traffic will come from a few IP addresses are sadly living in the past.

                olbohlen@norden.socialO 1 Reply Last reply
                0
                • ryanprior@mastodon.socialR ryanprior@mastodon.social

                  @olbohlen @HunterZ @osm_tech sad to hear that! It's wild though, you can sign up for a scraper proxy service in minutes. They're legal, inexpensive, and easy to use. Admins who assume scrapers are using their own machines that inauthentic traffic will come from a few IP addresses are sadly living in the past.

                  olbohlen@norden.socialO This user is from outside of this forum
                  olbohlen@norden.socialO This user is from outside of this forum
                  olbohlen@norden.social
                  wrote last edited by
                  #25

                  @ryanprior @HunterZ @osm_tech sure I could, but I refuse to put my selfhosted stuff behind some new dependency...

                  ryanprior@mastodon.socialR 1 Reply Last reply
                  0
                  • olbohlen@norden.socialO olbohlen@norden.social

                    @ryanprior @HunterZ @osm_tech sure I could, but I refuse to put my selfhosted stuff behind some new dependency...

                    ryanprior@mastodon.socialR This user is from outside of this forum
                    ryanprior@mastodon.socialR This user is from outside of this forum
                    ryanprior@mastodon.social
                    wrote last edited by
                    #26

                    @olbohlen @HunterZ @osm_tech the complexity of setting up defenses for this is regrettable

                    jessienab@wetdry.worldJ 1 Reply Last reply
                    0
                    • osm_tech@en.osm.townO osm_tech@en.osm.town

                      To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

                      If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
                      🙏🌍 #AI #Bots #Abuse

                      hlunke@darmstadt.socialH This user is from outside of this forum
                      hlunke@darmstadt.socialH This user is from outside of this forum
                      hlunke@darmstadt.social
                      wrote last edited by
                      #27

                      @osm_tech

                      Might be a good idea to become OSMF Member now or just donate some money.
                      Membership is starting at 15£/yer
                      https://supporting.openstreetmap.org/

                      1 Reply Last reply
                      0
                      • osm_tech@en.osm.townO osm_tech@en.osm.town

                        @JonSaenzAgirre It is a good questions, and we don't know the answer either. Our planet data is so much easier to process and use.

                        ff7@freiburg.socialF This user is from outside of this forum
                        ff7@freiburg.socialF This user is from outside of this forum
                        ff7@freiburg.social
                        wrote last edited by
                        #28

                        @osm_tech @JonSaenzAgirre thats dumb ai, probably. No "i" at all...

                        1 Reply Last reply
                        0
                        • osm_tech@en.osm.townO osm_tech@en.osm.town

                          To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

                          If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
                          🙏🌍 #AI #Bots #Abuse

                          burtyb@widget.ukB This user is from outside of this forum
                          burtyb@widget.ukB This user is from outside of this forum
                          burtyb@widget.uk
                          wrote last edited by
                          #29

                          @osm_tech sounds familiar, last year I braved turning cloudflares "under attack" mode off for https://dnshistory.org/ and saw an extra 5 million requests/day (500k unique IPs) overloading things. It's still blocking >700k requests/day a month later...

                          1 Reply Last reply
                          0
                          • osm_tech@en.osm.townO osm_tech@en.osm.town

                            To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

                            If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
                            🙏🌍 #AI #Bots #Abuse

                            clarinerd@mastodon.socialC This user is from outside of this forum
                            clarinerd@mastodon.socialC This user is from outside of this forum
                            clarinerd@mastodon.social
                            wrote last edited by
                            #30

                            @osm_tech and we can tell the scrapers are AI built because a cursory glance at the documentation on the "coders" part would've prevented this problem.

                            jkb@gotosocial.jkbockstael.beJ 1 Reply Last reply
                            0
                            • osm_tech@en.osm.townO osm_tech@en.osm.town

                              To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

                              If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
                              🙏🌍 #AI #Bots #Abuse

                              chestycougth@mastodon.socialC This user is from outside of this forum
                              chestycougth@mastodon.socialC This user is from outside of this forum
                              chestycougth@mastodon.social
                              wrote last edited by
                              #31

                              @osm_tech Thank you. I'm a beginner who has just been doing toy projects and has barely any notion of what web scraping is but I'm very happy to learn that your data can be downloaded 🙏

                              1 Reply Last reply
                              0
                              • grechaw@sfba.socialG This user is from outside of this forum
                                grechaw@sfba.socialG This user is from outside of this forum
                                grechaw@sfba.social
                                wrote last edited by
                                #32

                                @zymurgic @osm_tech this kind of abuse has become normal and normalized. It's the AI way. Makes it tough for the legit crawlers out there, too.

                                1 Reply Last reply
                                0
                                • hunterz@mastodon.sdf.orgH hunterz@mastodon.sdf.org

                                  @osm_tech does coming from residential IPs mean that someone has baked a scraper into some popular tool that people don't realize is doing that?

                                  marcel@waldvogel.familyM This user is from outside of this forum
                                  marcel@waldvogel.familyM This user is from outside of this forum
                                  marcel@waldvogel.family
                                  wrote last edited by
                                  #33

                                  @HunterZ @osm_tech
                                  My first guess would be some dual-use browser extension. Aka Trojan.

                                  1 Reply Last reply
                                  0
                                  • osm_tech@en.osm.townO osm_tech@en.osm.town

                                    To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

                                    If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
                                    🙏🌍 #AI #Bots #Abuse

                                    apnoe_soeren@mastodon.socialA This user is from outside of this forum
                                    apnoe_soeren@mastodon.socialA This user is from outside of this forum
                                    apnoe_soeren@mastodon.social
                                    wrote last edited by
                                    #34

                                    @osm_tech Limit the speed to Modem 14400 speed each IP for a month or so. 😅

                                    1 Reply Last reply
                                    0
                                    • jonsaenzagirre@mastodon.eusJ jonsaenzagirre@mastodon.eus

                                      @osm_tech question. Why do people scrape server which make the data freely available? And, probably, better structured in the final product. I don't see the point.

                                      vampirdaddy@chaos.socialV This user is from outside of this forum
                                      vampirdaddy@chaos.socialV This user is from outside of this forum
                                      vampirdaddy@chaos.social
                                      wrote last edited by
                                      #35

                                      @JonSaenzAgirre @osm_tech
                                      The scrapers are DUMB.
                                      They are not curated, have only basic maintenance, are built to gobble up ANYTHING textual they encounter, without respect, mercy or reason.

                                      Just collect meaningless data.

                                      That’s the nature of the coveted LLMs: just statistics, no understanding, structure or meaning.

                                      And greedy crooks in haste to make quick money just grab everything they can.

                                      The AI bubble needs to pop really soon.

                                      jonsaenzagirre@mastodon.eusJ 1 Reply Last reply
                                      0
                                      • clarinerd@mastodon.socialC clarinerd@mastodon.social

                                        @osm_tech and we can tell the scrapers are AI built because a cursory glance at the documentation on the "coders" part would've prevented this problem.

                                        jkb@gotosocial.jkbockstael.beJ This user is from outside of this forum
                                        jkb@gotosocial.jkbockstael.beJ This user is from outside of this forum
                                        jkb@gotosocial.jkbockstael.be
                                        wrote last edited by
                                        #36

                                        @ClariNerd @osm_tech Because their IP ranges are increasingly being blocked by servers following their harmful scraping habits, AI companies are now releasing "browsers" so they can scrape from residential IPs instead and circumvent blocks. Oh, sorry, I meant "so they can empower users with AI insight in this new era of information".

                                        clarinerd@mastodon.socialC 1 Reply Last reply
                                        0
                                        • jkb@gotosocial.jkbockstael.beJ jkb@gotosocial.jkbockstael.be

                                          @ClariNerd @osm_tech Because their IP ranges are increasingly being blocked by servers following their harmful scraping habits, AI companies are now releasing "browsers" so they can scrape from residential IPs instead and circumvent blocks. Oh, sorry, I meant "so they can empower users with AI insight in this new era of information".

                                          clarinerd@mastodon.socialC This user is from outside of this forum
                                          clarinerd@mastodon.socialC This user is from outside of this forum
                                          clarinerd@mastodon.social
                                          wrote last edited by
                                          #37

                                          @jkb @osm_tech brb repeatedly slamming my forehead against my desk for the next five minutes. Then I will reread that and hopefully it will seem less dystopian.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups