Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

Scheduled Pinned Locked Moved Uncategorized
openstreetmapbotsabuse
50 Posts 28 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • olbohlen@norden.socialO olbohlen@norden.social

    @ryanprior @HunterZ @osm_tech sure I could, but I refuse to put my selfhosted stuff behind some new dependency...

    ryanprior@mastodon.socialR This user is from outside of this forum
    ryanprior@mastodon.socialR This user is from outside of this forum
    ryanprior@mastodon.social
    wrote last edited by
    #26

    @olbohlen @HunterZ @osm_tech the complexity of setting up defenses for this is regrettable

    jessienab@wetdry.worldJ 1 Reply Last reply
    0
    • osm_tech@en.osm.townO osm_tech@en.osm.town

      To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

      If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
      πŸ™πŸŒ #AI #Bots #Abuse

      hlunke@darmstadt.socialH This user is from outside of this forum
      hlunke@darmstadt.socialH This user is from outside of this forum
      hlunke@darmstadt.social
      wrote last edited by
      #27

      @osm_tech

      Might be a good idea to become OSMF Member now or just donate some money.
      Membership is starting at 15Β£/yer
      https://supporting.openstreetmap.org/

      1 Reply Last reply
      0
      • osm_tech@en.osm.townO osm_tech@en.osm.town

        @JonSaenzAgirre It is a good questions, and we don't know the answer either. Our planet data is so much easier to process and use.

        ff7@freiburg.socialF This user is from outside of this forum
        ff7@freiburg.socialF This user is from outside of this forum
        ff7@freiburg.social
        wrote last edited by
        #28

        @osm_tech @JonSaenzAgirre thats dumb ai, probably. No "i" at all...

        1 Reply Last reply
        0
        • osm_tech@en.osm.townO osm_tech@en.osm.town

          To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

          If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
          πŸ™πŸŒ #AI #Bots #Abuse

          burtyb@widget.ukB This user is from outside of this forum
          burtyb@widget.ukB This user is from outside of this forum
          burtyb@widget.uk
          wrote last edited by
          #29

          @osm_tech sounds familiar, last year I braved turning cloudflares "under attack" mode off for https://dnshistory.org/ and saw an extra 5 million requests/day (500k unique IPs) overloading things. It's still blocking >700k requests/day a month later...

          1 Reply Last reply
          0
          • osm_tech@en.osm.townO osm_tech@en.osm.town

            To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

            If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
            πŸ™πŸŒ #AI #Bots #Abuse

            clarinerd@mastodon.socialC This user is from outside of this forum
            clarinerd@mastodon.socialC This user is from outside of this forum
            clarinerd@mastodon.social
            wrote last edited by
            #30

            @osm_tech and we can tell the scrapers are AI built because a cursory glance at the documentation on the "coders" part would've prevented this problem.

            jkb@gotosocial.jkbockstael.beJ 1 Reply Last reply
            0
            • osm_tech@en.osm.townO osm_tech@en.osm.town

              To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

              If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
              πŸ™πŸŒ #AI #Bots #Abuse

              chestycougth@mastodon.socialC This user is from outside of this forum
              chestycougth@mastodon.socialC This user is from outside of this forum
              chestycougth@mastodon.social
              wrote last edited by
              #31

              @osm_tech Thank you. I'm a beginner who has just been doing toy projects and has barely any notion of what web scraping is but I'm very happy to learn that your data can be downloaded πŸ™

              1 Reply Last reply
              0
              • grechaw@sfba.socialG This user is from outside of this forum
                grechaw@sfba.socialG This user is from outside of this forum
                grechaw@sfba.social
                wrote last edited by
                #32

                @zymurgic @osm_tech this kind of abuse has become normal and normalized. It's the AI way. Makes it tough for the legit crawlers out there, too.

                1 Reply Last reply
                0
                • hunterz@mastodon.sdf.orgH hunterz@mastodon.sdf.org

                  @osm_tech does coming from residential IPs mean that someone has baked a scraper into some popular tool that people don't realize is doing that?

                  marcel@waldvogel.familyM This user is from outside of this forum
                  marcel@waldvogel.familyM This user is from outside of this forum
                  marcel@waldvogel.family
                  wrote last edited by
                  #33

                  @HunterZ @osm_tech
                  My first guess would be some dual-use browser extension. Aka Trojan.

                  1 Reply Last reply
                  0
                  • osm_tech@en.osm.townO osm_tech@en.osm.town

                    To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

                    If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
                    πŸ™πŸŒ #AI #Bots #Abuse

                    apnoe_soeren@mastodon.socialA This user is from outside of this forum
                    apnoe_soeren@mastodon.socialA This user is from outside of this forum
                    apnoe_soeren@mastodon.social
                    wrote last edited by
                    #34

                    @osm_tech Limit the speed to Modem 14400 speed each IP for a month or so. πŸ˜…

                    1 Reply Last reply
                    0
                    • jonsaenzagirre@mastodon.eusJ jonsaenzagirre@mastodon.eus

                      @osm_tech question. Why do people scrape server which make the data freely available? And, probably, better structured in the final product. I don't see the point.

                      vampirdaddy@chaos.socialV This user is from outside of this forum
                      vampirdaddy@chaos.socialV This user is from outside of this forum
                      vampirdaddy@chaos.social
                      wrote last edited by
                      #35

                      @JonSaenzAgirre @osm_tech
                      The scrapers are DUMB.
                      They are not curated, have only basic maintenance, are built to gobble up ANYTHING textual they encounter, without respect, mercy or reason.

                      Just collect meaningless data.

                      That’s the nature of the coveted LLMs: just statistics, no understanding, structure or meaning.

                      And greedy crooks in haste to make quick money just grab everything they can.

                      The AI bubble needs to pop really soon.

                      jonsaenzagirre@mastodon.eusJ 1 Reply Last reply
                      0
                      • clarinerd@mastodon.socialC clarinerd@mastodon.social

                        @osm_tech and we can tell the scrapers are AI built because a cursory glance at the documentation on the "coders" part would've prevented this problem.

                        jkb@gotosocial.jkbockstael.beJ This user is from outside of this forum
                        jkb@gotosocial.jkbockstael.beJ This user is from outside of this forum
                        jkb@gotosocial.jkbockstael.be
                        wrote last edited by
                        #36

                        @ClariNerd @osm_tech Because their IP ranges are increasingly being blocked by servers following their harmful scraping habits, AI companies are now releasing "browsers" so they can scrape from residential IPs instead and circumvent blocks. Oh, sorry, I meant "so they can empower users with AI insight in this new era of information".

                        clarinerd@mastodon.socialC 1 Reply Last reply
                        0
                        • jkb@gotosocial.jkbockstael.beJ jkb@gotosocial.jkbockstael.be

                          @ClariNerd @osm_tech Because their IP ranges are increasingly being blocked by servers following their harmful scraping habits, AI companies are now releasing "browsers" so they can scrape from residential IPs instead and circumvent blocks. Oh, sorry, I meant "so they can empower users with AI insight in this new era of information".

                          clarinerd@mastodon.socialC This user is from outside of this forum
                          clarinerd@mastodon.socialC This user is from outside of this forum
                          clarinerd@mastodon.social
                          wrote last edited by
                          #37

                          @jkb @osm_tech brb repeatedly slamming my forehead against my desk for the next five minutes. Then I will reread that and hopefully it will seem less dystopian.

                          1 Reply Last reply
                          0
                          • vampirdaddy@chaos.socialV vampirdaddy@chaos.social

                            @JonSaenzAgirre @osm_tech
                            The scrapers are DUMB.
                            They are not curated, have only basic maintenance, are built to gobble up ANYTHING textual they encounter, without respect, mercy or reason.

                            Just collect meaningless data.

                            That’s the nature of the coveted LLMs: just statistics, no understanding, structure or meaning.

                            And greedy crooks in haste to make quick money just grab everything they can.

                            The AI bubble needs to pop really soon.

                            jonsaenzagirre@mastodon.eusJ This user is from outside of this forum
                            jonsaenzagirre@mastodon.eusJ This user is from outside of this forum
                            jonsaenzagirre@mastodon.eus
                            wrote last edited by
                            #38

                            @vampirdaddy @osm_tech this seems a reasonable explanation. Quantity of bytes irrespective of sense. Thank you

                            1 Reply Last reply
                            0
                            • osm_tech@en.osm.townO osm_tech@en.osm.town

                              @utf_7 It is madness, start here: https://www.openstreetmap.org/node/1 and keep going once you reach https://www.openstreetmap.org/node/10000000000, then start on ways, and relations πŸ˜› or just download the latest weekly export from planet.openstreetmap.org 😏

                              felixcremer@fediscience.orgF This user is from outside of this forum
                              felixcremer@fediscience.orgF This user is from outside of this forum
                              felixcremer@fediscience.org
                              wrote last edited by
                              #39

                              @osm_tech @utf_7 Why is the first node in OSM somewhere in Italy? I would have expected to find it in some random part of London?

                              simon@en.osm.townS 1 Reply Last reply
                              0
                              • osm_tech@en.osm.townO osm_tech@en.osm.town

                                To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.

                                If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org
                                πŸ™πŸŒ #AI #Bots #Abuse

                                ondrejzizka@witter.czO This user is from outside of this forum
                                ondrejzizka@witter.czO This user is from outside of this forum
                                ondrejzizka@witter.cz
                                wrote last edited by
                                #40

                                @osm_tech πŸ€¦β€β™‚οΈ

                                1 Reply Last reply
                                0
                                • osm_tech@en.osm.townO osm_tech@en.osm.town

                                  @michel42 We'd like to share the IP address list, but unfortunately don't think we can due to legal concerns.

                                  ondrejzizka@witter.czO This user is from outside of this forum
                                  ondrejzizka@witter.czO This user is from outside of this forum
                                  ondrejzizka@witter.cz
                                  wrote last edited by
                                  #41

                                  @osm_tech @michel42 Understood.

                                  Unrelated: Could you please provide me a list of cca 150k random large unsigned integers? I'm testing the xz library and need some test data.

                                  1 Reply Last reply
                                  0
                                  • felixcremer@fediscience.orgF felixcremer@fediscience.org

                                    @osm_tech @utf_7 Why is the first node in OSM somewhere in Italy? I would have expected to find it in some random part of London?

                                    simon@en.osm.townS This user is from outside of this forum
                                    simon@en.osm.townS This user is from outside of this forum
                                    simon@en.osm.town
                                    wrote last edited by
                                    #42

                                    @felixcremer @utf_7 because you are looking at version 43 of the node which has been subject to redaction (licence change), vandalism, and simply buggy software over 20+ years https://www.openstreetmap.org/node/1/history#map=18/1.999999/2.000000

                                    felixcremer@fediscience.orgF 1 Reply Last reply
                                    0
                                    • simon@en.osm.townS simon@en.osm.town

                                      @felixcremer @utf_7 because you are looking at version 43 of the node which has been subject to redaction (licence change), vandalism, and simply buggy software over 20+ years https://www.openstreetmap.org/node/1/history#map=18/1.999999/2.000000

                                      felixcremer@fediscience.orgF This user is from outside of this forum
                                      felixcremer@fediscience.orgF This user is from outside of this forum
                                      felixcremer@fediscience.org
                                      wrote last edited by
                                      #43

                                      @simon @utf_7 Thanks, yeah that makes sense.

                                      simon@en.osm.townS 1 Reply Last reply
                                      0
                                      • felixcremer@fediscience.orgF felixcremer@fediscience.org

                                        @simon @utf_7 Thanks, yeah that makes sense.

                                        simon@en.osm.townS This user is from outside of this forum
                                        simon@en.osm.townS This user is from outside of this forum
                                        simon@en.osm.town
                                        wrote last edited by
                                        #44

                                        @felixcremer @utf_7 I didn't mention this, but should have: prior to OSM API 0.5 (October 2007) objects were not versioned, the original "node 1" was deleted prior to that date and therefore doesn't actually exist in the current OSM data at all. The current "node 1" is a reuse of the old id IIRC.

                                        utf_7@mastodon.socialU 1 Reply Last reply
                                        0
                                        • harry_wood@en.osm.townH This user is from outside of this forum
                                          harry_wood@en.osm.townH This user is from outside of this forum
                                          harry_wood@en.osm.town
                                          wrote last edited by
                                          #45

                                          @zymurgic The website interface designed for humans is the main issue I believe. See also https://en.osm.town/@osm_tech/115974391032358572
                                          So that's... stupid

                                          I'm not sure who hosts the main Overpass API instance, but I don't think it is the OpenStreetMap Foundation, so (while they probably do have similar challenges) it's not that we're talking about.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups