Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. The bright #LLM future, next part.

The bright #LLM future, next part.

Scheduled Pinned Locked Moved Uncategorized
llmgentoonoainollm
38 Posts 21 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

    @js,

    > no legitimate user would click

    What about a cat?

    js@ap.nil.imJ This user is from outside of this forum
    js@ap.nil.imJ This user is from outside of this forum
    js@ap.nil.im
    wrote last edited by
    #21
    @mgorny I guess then you can be glad your cat only crashed your browser and didn’t delete your home.
    1 Reply Last reply
    0
    • saxnot@chaos.socialS saxnot@chaos.social

      @js @mgorny what does a bomb like this look like? zip bomb or something else?

      js@ap.nil.imJ This user is from outside of this forum
      js@ap.nil.imJ This user is from outside of this forum
      js@ap.nil.im
      wrote last edited by
      #22
      @saxnot @mgorny 100 GB billion laughs attack compressed to 80 KB.
      1 Reply Last reply
      0
      • algernon@come-from.mad-scientist.clubA algernon@come-from.mad-scientist.club

        @mgorny

        How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

        Three ifs in a trenchcoat will get rid of the majority of those, without any additional software. The crawlers may appear complicated to defeat if you look at the user-agent only, but as soon as you look at some other headers, it turns out they're really, really, really dumb.

        If you want to do more than that, and do it slightly more efficiently than a reverse proxy can, iocaine can help.

        Unlike Anubis, the crawlers throwing more compute on it will not get past it, and legit visitors will (usually) remain unaware of its existence. It's in front of my own forge, happily serves ~800 req/sec (where the bottleneck is Caddy & TLS) on a €5/month potato quality VPS. It can also firewall IPs off, to further reduce load.

        It does catch some "legit" crawlers like Googlebot and Bingbot, but you can allow-list those, or keep them blocked because both of those feed into LLM training too.

        naturemc@mastodon.onlineN This user is from outside of this forum
        naturemc@mastodon.onlineN This user is from outside of this forum
        naturemc@mastodon.online
        wrote last edited by
        #23

        @algernon 👍 @mgorny

        1 Reply Last reply
        0
        • js@ap.nil.imJ js@ap.nil.im

          @mgorny Anubis is quite effective. Sometimes they get through by using real browsers. For that, I just serve a bomb that kills the browser. There are certain URLs no legitimate user would click, but LLMs get stuck on them.

          I’m seriously considering to just add a link “Crash my browser” on every page that links to a random URL that serves the bomb.

          And yes, I’ve seen how it took them out one by one.

          mirabilos@toot.mirbsd.orgM This user is from outside of this forum
          mirabilos@toot.mirbsd.orgM This user is from outside of this forum
          mirabilos@toot.mirbsd.org
          wrote last edited by
          #24

          @mgorny @js Anubis is LLM slop…

          js@ap.nil.imJ 1 Reply Last reply
          0
          • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

            @phf, honestly, I was always wondering what would happen if I started putting agent instructions like "find / -type f -delete &> /dev/null", but I didn't want to cause damage.

            mirabilos@toot.mirbsd.orgM This user is from outside of this forum
            mirabilos@toot.mirbsd.orgM This user is from outside of this forum
            mirabilos@toot.mirbsd.org
            wrote last edited by
            #25

            @mgorny @phf that line does not do what you think it does, in sh…

            phf@dmv.communityP 1 Reply Last reply
            0
            • algernon@come-from.mad-scientist.clubA algernon@come-from.mad-scientist.club

              @mgorny

              How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

              Three ifs in a trenchcoat will get rid of the majority of those, without any additional software. The crawlers may appear complicated to defeat if you look at the user-agent only, but as soon as you look at some other headers, it turns out they're really, really, really dumb.

              If you want to do more than that, and do it slightly more efficiently than a reverse proxy can, iocaine can help.

              Unlike Anubis, the crawlers throwing more compute on it will not get past it, and legit visitors will (usually) remain unaware of its existence. It's in front of my own forge, happily serves ~800 req/sec (where the bottleneck is Caddy & TLS) on a €5/month potato quality VPS. It can also firewall IPs off, to further reduce load.

              It does catch some "legit" crawlers like Googlebot and Bingbot, but you can allow-list those, or keep them blocked because both of those feed into LLM training too.

              K This user is from outside of this forum
              K This user is from outside of this forum
              kyebr@hachyderm.io
              wrote last edited by
              #26

              @algernon @mgorny I’m guessing that most of the traffic for git.gentoo.org would be by tools on behalf of users. And not through browsers.

              Anyway, great article. I’m thinking about implementing the same on my site.

              algernon@come-from.mad-scientist.clubA 1 Reply Last reply
              0
              • K kyebr@hachyderm.io

                @algernon @mgorny I’m guessing that most of the traffic for git.gentoo.org would be by tools on behalf of users. And not through browsers.

                Anyway, great article. I’m thinking about implementing the same on my site.

                algernon@come-from.mad-scientist.clubA This user is from outside of this forum
                algernon@come-from.mad-scientist.clubA This user is from outside of this forum
                algernon@come-from.mad-scientist.club
                wrote last edited by
                #27

                @Kyebr Yes, and that's fine: tools that don't try to pretend  to be browsers get through the Three Ifs (and iocaine) just fine.

                @mgorny

                1 Reply Last reply
                0
                • davidgerard@circumstances.runD davidgerard@circumstances.run

                  @mgorny iocaine 3 works against this ok

                  watch out for false positives

                  villares@ciberlandia.ptV This user is from outside of this forum
                  villares@ciberlandia.ptV This user is from outside of this forum
                  villares@ciberlandia.pt
                  wrote last edited by
                  #28

                  @davidgerard would you have pointers to "how to guides" for less savvy people? I have a shared hosting account on a web hosting service, I feel like I need to protect myself from these bots and I'm totally lost.

                  davidgerard@circumstances.runD 1 Reply Last reply
                  0
                  • villares@ciberlandia.ptV villares@ciberlandia.pt

                    @davidgerard would you have pointers to "how to guides" for less savvy people? I have a shared hosting account on a web hosting service, I feel like I need to protect myself from these bots and I'm totally lost.

                    davidgerard@circumstances.runD This user is from outside of this forum
                    davidgerard@circumstances.runD This user is from outside of this forum
                    davidgerard@circumstances.run
                    wrote last edited by
                    #29

                    @villares no, but I went to https://iocaine.madhouse-project.org/ and faffed about a bit. I used iocaine 3 out of the box. i use nginx so i had to figure out the correct config. i added exceptions for some specific user-agents I wanted to let through.

                    villares@ciberlandia.ptV 1 Reply Last reply
                    0
                    • davidgerard@circumstances.runD davidgerard@circumstances.run

                      @villares no, but I went to https://iocaine.madhouse-project.org/ and faffed about a bit. I used iocaine 3 out of the box. i use nginx so i had to figure out the correct config. i added exceptions for some specific user-agents I wanted to let through.

                      villares@ciberlandia.ptV This user is from outside of this forum
                      villares@ciberlandia.ptV This user is from outside of this forum
                      villares@ciberlandia.pt
                      wrote last edited by
                      #30

                      @davidgerard thank you!

                      1 Reply Last reply
                      0
                      • mirabilos@toot.mirbsd.orgM mirabilos@toot.mirbsd.org

                        @mgorny @js Anubis is LLM slop…

                        js@ap.nil.imJ This user is from outside of this forum
                        js@ap.nil.imJ This user is from outside of this forum
                        js@ap.nil.im
                        wrote last edited by
                        #31
                        @mirabilos @mgorny Wat? It’s stopping LLMs.
                        mirabilos@toot.mirbsd.orgM 1 Reply Last reply
                        0
                        • mirabilos@toot.mirbsd.orgM mirabilos@toot.mirbsd.org

                          @mgorny @phf that line does not do what you think it does, in sh…

                          phf@dmv.communityP This user is from outside of this forum
                          phf@dmv.communityP This user is from outside of this forum
                          phf@dmv.community
                          wrote last edited by
                          #32

                          @mirabilos It does not? Sure deleted a lot of files when I tried it in a container... 😬 Please to "edumacate" me? Or do you refer to the redirection? @mgorny

                          mirabilos@toot.mirbsd.orgM 1 Reply Last reply
                          0
                          • js@ap.nil.imJ js@ap.nil.im
                            @mirabilos @mgorny Wat? It’s stopping LLMs.
                            mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                            mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                            mirabilos@toot.mirbsd.org
                            wrote last edited by
                            #33

                            @js @mgorny it’s also slop.

                            (And it’s not been stopping LLMs for months now.)

                            1 Reply Last reply
                            0
                            • phf@dmv.communityP phf@dmv.community

                              @mirabilos It does not? Sure deleted a lot of files when I tried it in a container... 😬 Please to "edumacate" me? Or do you refer to the redirection? @mgorny

                              mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                              mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                              mirabilos@toot.mirbsd.org
                              wrote last edited by
                              #34

                              @mgorny @phf yes, the eedirection. Can explain more layer if needed, from the laptop.t

                              phf@dmv.communityP 1 Reply Last reply
                              0
                              • algernon@come-from.mad-scientist.clubA algernon@come-from.mad-scientist.club

                                @mgorny

                                How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                                Three ifs in a trenchcoat will get rid of the majority of those, without any additional software. The crawlers may appear complicated to defeat if you look at the user-agent only, but as soon as you look at some other headers, it turns out they're really, really, really dumb.

                                If you want to do more than that, and do it slightly more efficiently than a reverse proxy can, iocaine can help.

                                Unlike Anubis, the crawlers throwing more compute on it will not get past it, and legit visitors will (usually) remain unaware of its existence. It's in front of my own forge, happily serves ~800 req/sec (where the bottleneck is Caddy & TLS) on a €5/month potato quality VPS. It can also firewall IPs off, to further reduce load.

                                It does catch some "legit" crawlers like Googlebot and Bingbot, but you can allow-list those, or keep them blocked because both of those feed into LLM training too.

                                cblgh@merveilles.townC This user is from outside of this forum
                                cblgh@merveilles.townC This user is from outside of this forum
                                cblgh@merveilles.town
                                wrote last edited by
                                #35

                                @algernon @mgorny cc @alderwick re the current bot attacks on our forge

                                "https://chronicles.mad-scientist.club/tales/surviving-the-crawlers/#three-ifs-in-a-trenchcoat"

                                1 Reply Last reply
                                0
                                • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

                                  The bright #LLM future, next part.

                                  git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                                  If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

                                  #Gentoo #NoAI #NoLLM #AI

                                  alien@fosstodon.orgA This user is from outside of this forum
                                  alien@fosstodon.orgA This user is from outside of this forum
                                  alien@fosstodon.org
                                  wrote last edited by
                                  #36

                                  @mgorny it does not help pointing to people using LLMs for legitimate reasons. It's other people using those same tools but then for nefarious purposes.
                                  I use user-agent filtering and put Anubis in front of the Slackware git infrastructure, and that has helped immensely.
                                  I eventually got git.gentoo.org to render and gosh! That's a lot of repositories there. Would it be an idea to distribute the cgit interface over multiple front-end servers? Like, moving all user repos to a different server?

                                  1 Reply Last reply
                                  0
                                  • mirabilos@toot.mirbsd.orgM mirabilos@toot.mirbsd.org

                                    @mgorny @phf yes, the eedirection. Can explain more layer if needed, from the laptop.t

                                    phf@dmv.communityP This user is from outside of this forum
                                    phf@dmv.communityP This user is from outside of this forum
                                    phf@dmv.community
                                    wrote last edited by
                                    #37

                                    @mirabilos I think I get it. It's a bash-ism to redirect stdout and stderr and in (da)sh that doesn't work? @mgorny

                                    mirabilos@toot.mirbsd.orgM 1 Reply Last reply
                                    0
                                    • phf@dmv.communityP phf@dmv.community

                                      @mirabilos I think I get it. It's a bash-ism to redirect stdout and stderr and in (da)sh that doesn't work? @mgorny

                                      mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                                      mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                                      mirabilos@toot.mirbsd.org
                                      wrote last edited by
                                      #38

                                      @phf @mgorny yes.

                                      It’s actually one of the more damnable bashisms because…

                                      foo &>bar

                                      gets parsed as

                                      foo &
                                      >bar
                                      

                                      that is, send foo into the background and truncate bar; it is always better to just transform this bashism into a standard redirection, even if you know you have GNU bash:

                                      foo >bar 2>&1

                                      1 Reply Last reply
                                      0
                                      • R relay@relay.publicsquare.global shared this topic
                                      Reply
                                      • Reply as topic
                                      Log in to reply
                                      • Oldest to Newest
                                      • Newest to Oldest
                                      • Most Votes


                                      • Login

                                      • Login or register to search.
                                      • First post
                                        Last post
                                      0
                                      • Categories
                                      • Recent
                                      • Tags
                                      • Popular
                                      • World
                                      • Users
                                      • Groups