Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. The bright #LLM future, next part.

The bright #LLM future, next part.

Scheduled Pinned Locked Moved Uncategorized
llmgentoonoainollm
38 Posts 21 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

    @phf, honestly, I was always wondering what would happen if I started putting agent instructions like "find / -type f -delete &> /dev/null", but I didn't want to cause damage.

    iwein@mas.toI This user is from outside of this forum
    iwein@mas.toI This user is from outside of this forum
    iwein@mas.to
    wrote last edited by
    #7

    @mgorny it's totally a fair experiment imho

    @phf

    1 Reply Last reply
    0
    • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

      The bright #LLM future, next part.

      git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

      If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

      #Gentoo #NoAI #NoLLM #AI

      saxnot@chaos.socialS This user is from outside of this forum
      saxnot@chaos.socialS This user is from outside of this forum
      saxnot@chaos.social
      wrote last edited by
      #8

      @mgorny oh no this forces the internet to become more and more private and intive-only instead of public

      1 Reply Last reply
      0
      • js@ap.nil.imJ js@ap.nil.im

        @mgorny Anubis is quite effective. Sometimes they get through by using real browsers. For that, I just serve a bomb that kills the browser. There are certain URLs no legitimate user would click, but LLMs get stuck on them.

        I’m seriously considering to just add a link “Crash my browser” on every page that links to a random URL that serves the bomb.

        And yes, I’ve seen how it took them out one by one.

        saxnot@chaos.socialS This user is from outside of this forum
        saxnot@chaos.socialS This user is from outside of this forum
        saxnot@chaos.social
        wrote last edited by
        #9

        @js @mgorny what does a bomb like this look like? zip bomb or something else?

        js@ap.nil.imJ 1 Reply Last reply
        0
        • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

          The bright #LLM future, next part.

          git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

          If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

          #Gentoo #NoAI #NoLLM #AI

          ericalaeta@ruhr.socialE This user is from outside of this forum
          ericalaeta@ruhr.socialE This user is from outside of this forum
          ericalaeta@ruhr.social
          wrote last edited by
          #10

          @mgorny That‘s what I thought when a colleague reported how his bike was stolen and he vibecoded a scraper that searches European marketplaces for the stolen item. It‘s a good idea but what if everyone starts using such tools? How can we buffer the results or the queries to avoid practically ddossing the marketplaces?

          machocam@mastodon.socialM 1 Reply Last reply
          0
          • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

            The bright #LLM future, next part.

            git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

            If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

            #Gentoo #NoAI #NoLLM #AI

            davidgerard@circumstances.runD This user is from outside of this forum
            davidgerard@circumstances.runD This user is from outside of this forum
            davidgerard@circumstances.run
            wrote last edited by
            #11

            @mgorny iocaine 3 works against this ok

            watch out for false positives

            villares@ciberlandia.ptV 1 Reply Last reply
            0
            • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

              The bright #LLM future, next part.

              git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

              If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

              #Gentoo #NoAI #NoLLM #AI

              newcancerbero@masto.esN This user is from outside of this forum
              newcancerbero@masto.esN This user is from outside of this forum
              newcancerbero@masto.es
              wrote last edited by
              #12

              @mgorny Sorry but you are wrong. I use LLM and yesterday I give Importan information about attack of Mastodon social, and how was used that scum the A.I. Why? I know very much model, and I know than model her can use for a attack, I for my continue testing of A I. Can explain to moderador the nature of attack. Can you explain point by point was happened? I think the answer is NO. Think about this, you want fight blind, with clothes anti-bullet for stoping a Tank. Use your mind please.

              1 Reply Last reply
              0
              • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

                The bright #LLM future, next part.

                git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

                #Gentoo #NoAI #NoLLM #AI

                algernon@come-from.mad-scientist.clubA This user is from outside of this forum
                algernon@come-from.mad-scientist.clubA This user is from outside of this forum
                algernon@come-from.mad-scientist.club
                wrote last edited by
                #13

                @mgorny

                How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                Three ifs in a trenchcoat will get rid of the majority of those, without any additional software. The crawlers may appear complicated to defeat if you look at the user-agent only, but as soon as you look at some other headers, it turns out they're really, really, really dumb.

                If you want to do more than that, and do it slightly more efficiently than a reverse proxy can, iocaine can help.

                Unlike Anubis, the crawlers throwing more compute on it will not get past it, and legit visitors will (usually) remain unaware of its existence. It's in front of my own forge, happily serves ~800 req/sec (where the bottleneck is Caddy & TLS) on a €5/month potato quality VPS. It can also firewall IPs off, to further reduce load.

                It does catch some "legit" crawlers like Googlebot and Bingbot, but you can allow-list those, or keep them blocked because both of those feed into LLM training too.

                naturemc@mastodon.onlineN K cblgh@merveilles.townC 3 Replies Last reply
                0
                • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

                  The bright #LLM future, next part.

                  git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                  If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

                  #Gentoo #NoAI #NoLLM #AI

                  kigelia@mastodon.onlineK This user is from outside of this forum
                  kigelia@mastodon.onlineK This user is from outside of this forum
                  kigelia@mastodon.online
                  wrote last edited by
                  #14

                  @mgorny I am fighting a lone battle in my department at work against use of AI tools due to the environmental impact.

                  However from someone nowhere near technical enough to understand this completely. Is this post saying essentially that the crawling of the internet for input to ‘learn’ from by AI companies is clogging up the online world?

                  swift@merveilles.townS mgorny@social.treehouse.systemsM 2 Replies Last reply
                  0
                  • kigelia@mastodon.onlineK kigelia@mastodon.online

                    @mgorny I am fighting a lone battle in my department at work against use of AI tools due to the environmental impact.

                    However from someone nowhere near technical enough to understand this completely. Is this post saying essentially that the crawling of the internet for input to ‘learn’ from by AI companies is clogging up the online world?

                    swift@merveilles.townS This user is from outside of this forum
                    swift@merveilles.townS This user is from outside of this forum
                    swift@merveilles.town
                    wrote last edited by
                    #15

                    @kigelia @mgorny that is certainly a problem that is occurring, yes. Since code generation is a primary use case, public code stores are an example of sites that are getting hit badly, which in turn is making open source less viable (alongside junk "fixes" and other noise)

                    1 Reply Last reply
                    0
                    • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

                      The bright #LLM future, next part.

                      git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                      If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

                      #Gentoo #NoAI #NoLLM #AI

                      gunstick@mastodon.opencloud.luG This user is from outside of this forum
                      gunstick@mastodon.opencloud.luG This user is from outside of this forum
                      gunstick@mastodon.opencloud.lu
                      wrote last edited by
                      #16

                      @mgorny I blocked it all based on user agent. They use a useragent which is quite rare. So I blocked that.

                      1 Reply Last reply
                      0
                      • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

                        The bright #LLM future, next part.

                        git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                        If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

                        #Gentoo #NoAI #NoLLM #AI

                        leah@chaos.socialL This user is from outside of this forum
                        leah@chaos.socialL This user is from outside of this forum
                        leah@chaos.social
                        wrote last edited by
                        #17

                        @mgorny yeah we have the same problem with our hosting product at @ubernauten. It's sucks to spend so much time working to mitigate the bad behavior of others. Sadly we have no solution either.

                        1 Reply Last reply
                        0
                        • js@ap.nil.imJ js@ap.nil.im

                          @mgorny Anubis is quite effective. Sometimes they get through by using real browsers. For that, I just serve a bomb that kills the browser. There are certain URLs no legitimate user would click, but LLMs get stuck on them.

                          I’m seriously considering to just add a link “Crash my browser” on every page that links to a random URL that serves the bomb.

                          And yes, I’ve seen how it took them out one by one.

                          bgnfu7re@lviv.socialB This user is from outside of this forum
                          bgnfu7re@lviv.socialB This user is from outside of this forum
                          bgnfu7re@lviv.social
                          wrote last edited by
                          #18

                          @js @mgorny I would definitely hit that, just to see what happens 😬 definitely would also curl afterwards

                          1 Reply Last reply
                          0
                          • ericalaeta@ruhr.socialE ericalaeta@ruhr.social

                            @mgorny That‘s what I thought when a colleague reported how his bike was stolen and he vibecoded a scraper that searches European marketplaces for the stolen item. It‘s a good idea but what if everyone starts using such tools? How can we buffer the results or the queries to avoid practically ddossing the marketplaces?

                            machocam@mastodon.socialM This user is from outside of this forum
                            machocam@mastodon.socialM This user is from outside of this forum
                            machocam@mastodon.social
                            wrote last edited by
                            #19

                            @ericalaeta @mgorny

                            Everyone will have to introduce APIs that can accept much more traffic.

                            1 Reply Last reply
                            0
                            • kigelia@mastodon.onlineK kigelia@mastodon.online

                              @mgorny I am fighting a lone battle in my department at work against use of AI tools due to the environmental impact.

                              However from someone nowhere near technical enough to understand this completely. Is this post saying essentially that the crawling of the internet for input to ‘learn’ from by AI companies is clogging up the online world?

                              mgorny@social.treehouse.systemsM This user is from outside of this forum
                              mgorny@social.treehouse.systemsM This user is from outside of this forum
                              mgorny@social.treehouse.systems
                              wrote last edited by
                              #20

                              @kigelia, yep. We're hosting a few huge repos (and a lot of small ones), so the load caused by crawling everything randomly (including stuff such as commit histories filtered by individual files, git blames and other stuff that's entirely redundant) prevents real people from using the service.

                              1 Reply Last reply
                              0
                              • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

                                @js,

                                > no legitimate user would click

                                What about a cat?

                                js@ap.nil.imJ This user is from outside of this forum
                                js@ap.nil.imJ This user is from outside of this forum
                                js@ap.nil.im
                                wrote last edited by
                                #21
                                @mgorny I guess then you can be glad your cat only crashed your browser and didn’t delete your home.
                                1 Reply Last reply
                                0
                                • saxnot@chaos.socialS saxnot@chaos.social

                                  @js @mgorny what does a bomb like this look like? zip bomb or something else?

                                  js@ap.nil.imJ This user is from outside of this forum
                                  js@ap.nil.imJ This user is from outside of this forum
                                  js@ap.nil.im
                                  wrote last edited by
                                  #22
                                  @saxnot @mgorny 100 GB billion laughs attack compressed to 80 KB.
                                  1 Reply Last reply
                                  0
                                  • algernon@come-from.mad-scientist.clubA algernon@come-from.mad-scientist.club

                                    @mgorny

                                    How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                                    Three ifs in a trenchcoat will get rid of the majority of those, without any additional software. The crawlers may appear complicated to defeat if you look at the user-agent only, but as soon as you look at some other headers, it turns out they're really, really, really dumb.

                                    If you want to do more than that, and do it slightly more efficiently than a reverse proxy can, iocaine can help.

                                    Unlike Anubis, the crawlers throwing more compute on it will not get past it, and legit visitors will (usually) remain unaware of its existence. It's in front of my own forge, happily serves ~800 req/sec (where the bottleneck is Caddy & TLS) on a €5/month potato quality VPS. It can also firewall IPs off, to further reduce load.

                                    It does catch some "legit" crawlers like Googlebot and Bingbot, but you can allow-list those, or keep them blocked because both of those feed into LLM training too.

                                    naturemc@mastodon.onlineN This user is from outside of this forum
                                    naturemc@mastodon.onlineN This user is from outside of this forum
                                    naturemc@mastodon.online
                                    wrote last edited by
                                    #23

                                    @algernon 👍 @mgorny

                                    1 Reply Last reply
                                    0
                                    • js@ap.nil.imJ js@ap.nil.im

                                      @mgorny Anubis is quite effective. Sometimes they get through by using real browsers. For that, I just serve a bomb that kills the browser. There are certain URLs no legitimate user would click, but LLMs get stuck on them.

                                      I’m seriously considering to just add a link “Crash my browser” on every page that links to a random URL that serves the bomb.

                                      And yes, I’ve seen how it took them out one by one.

                                      mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                                      mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                                      mirabilos@toot.mirbsd.org
                                      wrote last edited by
                                      #24

                                      @mgorny @js Anubis is LLM slop…

                                      js@ap.nil.imJ 1 Reply Last reply
                                      0
                                      • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

                                        @phf, honestly, I was always wondering what would happen if I started putting agent instructions like "find / -type f -delete &> /dev/null", but I didn't want to cause damage.

                                        mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                                        mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                                        mirabilos@toot.mirbsd.org
                                        wrote last edited by
                                        #25

                                        @mgorny @phf that line does not do what you think it does, in sh…

                                        phf@dmv.communityP 1 Reply Last reply
                                        0
                                        • algernon@come-from.mad-scientist.clubA algernon@come-from.mad-scientist.club

                                          @mgorny

                                          How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                                          Three ifs in a trenchcoat will get rid of the majority of those, without any additional software. The crawlers may appear complicated to defeat if you look at the user-agent only, but as soon as you look at some other headers, it turns out they're really, really, really dumb.

                                          If you want to do more than that, and do it slightly more efficiently than a reverse proxy can, iocaine can help.

                                          Unlike Anubis, the crawlers throwing more compute on it will not get past it, and legit visitors will (usually) remain unaware of its existence. It's in front of my own forge, happily serves ~800 req/sec (where the bottleneck is Caddy & TLS) on a €5/month potato quality VPS. It can also firewall IPs off, to further reduce load.

                                          It does catch some "legit" crawlers like Googlebot and Bingbot, but you can allow-list those, or keep them blocked because both of those feed into LLM training too.

                                          K This user is from outside of this forum
                                          K This user is from outside of this forum
                                          kyebr@hachyderm.io
                                          wrote last edited by
                                          #26

                                          @algernon @mgorny I’m guessing that most of the traffic for git.gentoo.org would be by tools on behalf of users. And not through browsers.

                                          Anyway, great article. I’m thinking about implementing the same on my site.

                                          algernon@come-from.mad-scientist.clubA 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups