To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #AbuseMight be a good idea to become OSMF Member now or just donate some money.
Membership is starting at 15Β£/yer
https://supporting.openstreetmap.org/ -
@JonSaenzAgirre It is a good questions, and we don't know the answer either. Our planet data is so much easier to process and use.
@osm_tech @JonSaenzAgirre thats dumb ai, probably. No "i" at all...
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech sounds familiar, last year I braved turning cloudflares "under attack" mode off for https://dnshistory.org/ and saw an extra 5 million requests/day (500k unique IPs) overloading things. It's still blocking >700k requests/day a month later...
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech and we can tell the scrapers are AI built because a cursory glance at the documentation on the "coders" part would've prevented this problem.
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech Thank you. I'm a beginner who has just been doing toy projects and has barely any notion of what web scraping is but I'm very happy to learn that your data can be downloaded

-
-
@osm_tech does coming from residential IPs mean that someone has baked a scraper into some popular tool that people don't realize is doing that?
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech Limit the speed to Modem 14400 speed each IP for a month or so.

-
@osm_tech question. Why do people scrape server which make the data freely available? And, probably, better structured in the final product. I don't see the point.
@JonSaenzAgirre @osm_tech
The scrapers are DUMB.
They are not curated, have only basic maintenance, are built to gobble up ANYTHING textual they encounter, without respect, mercy or reason.Just collect meaningless data.
Thatβs the nature of the coveted LLMs: just statistics, no understanding, structure or meaning.
And greedy crooks in haste to make quick money just grab everything they can.
The AI bubble needs to pop really soon.
-
@osm_tech and we can tell the scrapers are AI built because a cursory glance at the documentation on the "coders" part would've prevented this problem.
@ClariNerd @osm_tech Because their IP ranges are increasingly being blocked by servers following their harmful scraping habits, AI companies are now releasing "browsers" so they can scrape from residential IPs instead and circumvent blocks. Oh, sorry, I meant "so they can empower users with AI insight in this new era of information".
-
@ClariNerd @osm_tech Because their IP ranges are increasingly being blocked by servers following their harmful scraping habits, AI companies are now releasing "browsers" so they can scrape from residential IPs instead and circumvent blocks. Oh, sorry, I meant "so they can empower users with AI insight in this new era of information".
-
@JonSaenzAgirre @osm_tech
The scrapers are DUMB.
They are not curated, have only basic maintenance, are built to gobble up ANYTHING textual they encounter, without respect, mercy or reason.Just collect meaningless data.
Thatβs the nature of the coveted LLMs: just statistics, no understanding, structure or meaning.
And greedy crooks in haste to make quick money just grab everything they can.
The AI bubble needs to pop really soon.
@vampirdaddy @osm_tech this seems a reasonable explanation. Quantity of bytes irrespective of sense. Thank you
-
@utf_7 It is madness, start here: https://www.openstreetmap.org/node/1 and keep going once you reach https://www.openstreetmap.org/node/10000000000, then start on ways, and relations
or just download the latest weekly export from planet.openstreetmap.org 
-
To keep #OpenStreetMap.org up and running while we're being deluged by scrapers, we've blocked 320,000+ primarily residential IPv4 addresses in the last 24 hours (+ 100,000 IPv6) involved in scraping.
If you need OSM data, please don't scrape the website - use the official downloads at https://planet.openstreetmap.org

#AI #Bots #Abuse@osm_tech
β
οΈ -
@michel42 We'd like to share the IP address list, but unfortunately don't think we can due to legal concerns.
-
@felixcremer @utf_7 because you are looking at version 43 of the node which has been subject to redaction (licence change), vandalism, and simply buggy software over 20+ years https://www.openstreetmap.org/node/1/history#map=18/1.999999/2.000000
-
@felixcremer @utf_7 because you are looking at version 43 of the node which has been subject to redaction (licence change), vandalism, and simply buggy software over 20+ years https://www.openstreetmap.org/node/1/history#map=18/1.999999/2.000000
-
@felixcremer @utf_7 I didn't mention this, but should have: prior to OSM API 0.5 (October 2007) objects were not versioned, the original "node 1" was deleted prior to that date and therefore doesn't actually exist in the current OSM data at all. The current "node 1" is a reuse of the old id IIRC.
-
@zymurgic The website interface designed for humans is the main issue I believe. See also https://en.osm.town/@osm_tech/115974391032358572
So that's... stupidI'm not sure who hosts the main Overpass API instance, but I don't think it is the OpenStreetMap Foundation, so (while they probably do have similar challenges) it's not that we're talking about.
-
@felixcremer @utf_7 I didn't mention this, but should have: prior to OSM API 0.5 (October 2007) objects were not versioned, the original "node 1" was deleted prior to that date and therefore doesn't actually exist in the current OSM data at all. The current "node 1" is a reuse of the old id IIRC.
@simon @felixcremer til something about osm nodes. what distance are 2 neighboring nodes away? or does this vary of the resolution of the area. like on the high seas there are more miles away than in Detroit