Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. The bright #LLM future, next part.

The bright #LLM future, next part.

Scheduled Pinned Locked Moved Uncategorized
llmgentoonoainollm
60 Posts 29 Posters 182 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • js@ap.nil.imJ js@ap.nil.im
    @mirabilos @mgorny Wat? It’s stopping LLMs.
    mirabilos@toot.mirbsd.orgM This user is from outside of this forum
    mirabilos@toot.mirbsd.orgM This user is from outside of this forum
    mirabilos@toot.mirbsd.org
    wrote last edited by
    #33

    @js @mgorny it’s also slop.

    (And it’s not been stopping LLMs for months now.)

    js@ap.nil.imJ 1 Reply Last reply
    0
    • phf@dmv.communityP phf@dmv.community

      @mirabilos It does not? Sure deleted a lot of files when I tried it in a container... 😬 Please to "edumacate" me? Or do you refer to the redirection? @mgorny

      mirabilos@toot.mirbsd.orgM This user is from outside of this forum
      mirabilos@toot.mirbsd.orgM This user is from outside of this forum
      mirabilos@toot.mirbsd.org
      wrote last edited by
      #34

      @mgorny @phf yes, the eedirection. Can explain more layer if needed, from the laptop.t

      phf@dmv.communityP 1 Reply Last reply
      0
      • algernon@come-from.mad-scientist.clubA algernon@come-from.mad-scientist.club

        @mgorny

        How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

        Three ifs in a trenchcoat will get rid of the majority of those, without any additional software. The crawlers may appear complicated to defeat if you look at the user-agent only, but as soon as you look at some other headers, it turns out they're really, really, really dumb.

        If you want to do more than that, and do it slightly more efficiently than a reverse proxy can, iocaine can help.

        Unlike Anubis, the crawlers throwing more compute on it will not get past it, and legit visitors will (usually) remain unaware of its existence. It's in front of my own forge, happily serves ~800 req/sec (where the bottleneck is Caddy & TLS) on a €5/month potato quality VPS. It can also firewall IPs off, to further reduce load.

        It does catch some "legit" crawlers like Googlebot and Bingbot, but you can allow-list those, or keep them blocked because both of those feed into LLM training too.

        cblgh@merveilles.townC This user is from outside of this forum
        cblgh@merveilles.townC This user is from outside of this forum
        cblgh@merveilles.town
        wrote last edited by
        #35

        @algernon @mgorny cc @alderwick re the current bot attacks on our forge

        "https://chronicles.mad-scientist.club/tales/surviving-the-crawlers/#three-ifs-in-a-trenchcoat"

        alderwick@merveilles.townA 1 Reply Last reply
        0
        • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

          The bright #LLM future, next part.

          git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

          If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

          #Gentoo #NoAI #NoLLM #AI

          alien@fosstodon.orgA This user is from outside of this forum
          alien@fosstodon.orgA This user is from outside of this forum
          alien@fosstodon.org
          wrote last edited by
          #36

          @mgorny it does not help pointing to people using LLMs for legitimate reasons. It's other people using those same tools but then for nefarious purposes.
          I use user-agent filtering and put Anubis in front of the Slackware git infrastructure, and that has helped immensely.
          I eventually got git.gentoo.org to render and gosh! That's a lot of repositories there. Would it be an idea to distribute the cgit interface over multiple front-end servers? Like, moving all user repos to a different server?

          mgorny@social.treehouse.systemsM epic_null@infosec.exchangeE 2 Replies Last reply
          0
          • mirabilos@toot.mirbsd.orgM mirabilos@toot.mirbsd.org

            @mgorny @phf yes, the eedirection. Can explain more layer if needed, from the laptop.t

            phf@dmv.communityP This user is from outside of this forum
            phf@dmv.communityP This user is from outside of this forum
            phf@dmv.community
            wrote last edited by
            #37

            @mirabilos I think I get it. It's a bash-ism to redirect stdout and stderr and in (da)sh that doesn't work? @mgorny

            mirabilos@toot.mirbsd.orgM 1 Reply Last reply
            0
            • phf@dmv.communityP phf@dmv.community

              @mirabilos I think I get it. It's a bash-ism to redirect stdout and stderr and in (da)sh that doesn't work? @mgorny

              mirabilos@toot.mirbsd.orgM This user is from outside of this forum
              mirabilos@toot.mirbsd.orgM This user is from outside of this forum
              mirabilos@toot.mirbsd.org
              wrote last edited by
              #38

              @phf @mgorny yes.

              It’s actually one of the more damnable bashisms because…

              foo &>bar

              gets parsed as

              foo &
              >bar
              

              that is, send foo into the background and truncate bar; it is always better to just transform this bashism into a standard redirection, even if you know you have GNU bash:

              foo >bar 2>&1

              1 Reply Last reply
              0
              • R relay@relay.publicsquare.global shared this topic
              • villares@ciberlandia.ptV villares@ciberlandia.pt

                @davidgerard thank you!

                villares@ciberlandia.ptV This user is from outside of this forum
                villares@ciberlandia.ptV This user is from outside of this forum
                villares@ciberlandia.pt
                wrote last edited by
                #39

                @davidgerard

                I'm afraid on this cheap shared hosting, if I understood correctly the docs, I don't have enough permissions to run it, but I'll keep looking for stuff people could use on simple static pages...

                1 Reply Last reply
                0
                • alien@fosstodon.orgA alien@fosstodon.org

                  @mgorny it does not help pointing to people using LLMs for legitimate reasons. It's other people using those same tools but then for nefarious purposes.
                  I use user-agent filtering and put Anubis in front of the Slackware git infrastructure, and that has helped immensely.
                  I eventually got git.gentoo.org to render and gosh! That's a lot of repositories there. Would it be an idea to distribute the cgit interface over multiple front-end servers? Like, moving all user repos to a different server?

                  mgorny@social.treehouse.systemsM This user is from outside of this forum
                  mgorny@social.treehouse.systemsM This user is from outside of this forum
                  mgorny@social.treehouse.systems
                  wrote last edited by
                  #40

                  @alien@fosstodon.org, right, thank you for your concern. Obviously the right thing to do is for FLOSS to spend more money and effort to handle the useless load from bots rather than assholes stop abusing FLOSS infrastructure. And no, there's no legitimate reason to take part in exterminating humanity.

                  1 Reply Last reply
                  0
                  • mirabilos@toot.mirbsd.orgM mirabilos@toot.mirbsd.org

                    @js @mgorny it’s also slop.

                    (And it’s not been stopping LLMs for months now.)

                    js@ap.nil.imJ This user is from outside of this forum
                    js@ap.nil.imJ This user is from outside of this forum
                    js@ap.nil.im
                    wrote last edited by
                    #41

                    @mirabilos @mgorny Works pretty well for me, looking at the CPU graph.

                    But using LLMs to block LLMs is kinda ironic. Didn’t know they use LLMs themselves.

                    mirabilos@toot.mirbsd.orgM 1 Reply Last reply
                    0
                    • js@ap.nil.imJ js@ap.nil.im

                      @mirabilos @mgorny Works pretty well for me, looking at the CPU graph.

                      But using LLMs to block LLMs is kinda ironic. Didn’t know they use LLMs themselves.

                      mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                      mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                      mirabilos@toot.mirbsd.org
                      wrote last edited by
                      #42

                      @js @mgorny yeah, someone found slop agent instructions in their repo some days ago.

                      epic_null@infosec.exchangeE 1 Reply Last reply
                      0
                      • saxnot@chaos.socialS saxnot@chaos.social

                        @mgorny oh no this forces the internet to become more and more private and intive-only instead of public

                        dzwiedziu@mastodon.socialD This user is from outside of this forum
                        dzwiedziu@mastodon.socialD This user is from outside of this forum
                        dzwiedziu@mastodon.social
                        wrote last edited by
                        #43

                        @saxnot
                        This rather will happen if all countermeasures start failing, and other movents or resources won't be able to stop the slop.

                        @mgorny

                        1 Reply Last reply
                        0
                        • cblgh@merveilles.townC cblgh@merveilles.town

                          @algernon @mgorny cc @alderwick re the current bot attacks on our forge

                          "https://chronicles.mad-scientist.club/tales/surviving-the-crawlers/#three-ifs-in-a-trenchcoat"

                          alderwick@merveilles.townA This user is from outside of this forum
                          alderwick@merveilles.townA This user is from outside of this forum
                          alderwick@merveilles.town
                          wrote last edited by
                          #44

                          @cblgh @algernon @mgorny I spotted that post too, and found it very useful! Pity I haven't been recording the other HTTP headers so far but I'm working on the poisoning side!

                          algernon@come-from.mad-scientist.clubA 1 Reply Last reply
                          0
                          • alderwick@merveilles.townA alderwick@merveilles.town

                            @cblgh @algernon @mgorny I spotted that post too, and found it very useful! Pity I haven't been recording the other HTTP headers so far but I'm working on the poisoning side!

                            algernon@come-from.mad-scientist.clubA This user is from outside of this forum
                            algernon@come-from.mad-scientist.clubA This user is from outside of this forum
                            algernon@come-from.mad-scientist.club
                            wrote last edited by
                            #45

                            @alderwick @cblgh @mgorny If I can be of any assistance, let me know, happy to help in any way I'm able to.

                            1 Reply Last reply
                            0
                            • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

                              The bright #LLM future, next part.

                              git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                              If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

                              #Gentoo #NoAI #NoLLM #AI

                              cenbe@techhub.socialC This user is from outside of this forum
                              cenbe@techhub.socialC This user is from outside of this forum
                              cenbe@techhub.social
                              wrote last edited by
                              #46

                              @mgorny Anything involving the web has been dead for a long time.

                              1 Reply Last reply
                              0
                              • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

                                The bright #LLM future, next part.

                                git.gentoo.org is now effectively dead, being DDoS-ed by almost a million different IPs every day. Most of them are just performing a single request at a totally random URL. How are people supposed to deal with that? How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

                                If you use LLM crap, you're part of the problem. You support these bastards. You should be ashamed of yourself.

                                #Gentoo #NoAI #NoLLM #AI

                                mangoiv@functional.cafeM This user is from outside of this forum
                                mangoiv@functional.cafeM This user is from outside of this forum
                                mangoiv@functional.cafe
                                wrote last edited by
                                #47

                                @mgorny the GHC (Haskell compiler) Gitlab is suffering similar issues.

                                1 Reply Last reply
                                0
                                • mgorny@social.treehouse.systemsM mgorny@social.treehouse.systems

                                  @kigelia, yep. We're hosting a few huge repos (and a lot of small ones), so the load caused by crawling everything randomly (including stuff such as commit histories filtered by individual files, git blames and other stuff that's entirely redundant) prevents real people from using the service.

                                  hllizi@hespere.deH This user is from outside of this forum
                                  hllizi@hespere.deH This user is from outside of this forum
                                  hllizi@hespere.de
                                  wrote last edited by
                                  #48

                                  @mgorny @kigelia We had the problem, too, with a bunch of libraries. Deploying Anubis largely solved it for us so far, luckily.

                                  Didn't expect to end up playing Cloudflare when I took this job.

                                  1 Reply Last reply
                                  0
                                  • davidgerard@circumstances.runD davidgerard@circumstances.run

                                    @villares no, but I went to https://iocaine.madhouse-project.org/ and faffed about a bit. I used iocaine 3 out of the box. i use nginx so i had to figure out the correct config. i added exceptions for some specific user-agents I wanted to let through.

                                    hierkiosk@social.tchncs.deH This user is from outside of this forum
                                    hierkiosk@social.tchncs.deH This user is from outside of this forum
                                    hierkiosk@social.tchncs.de
                                    wrote last edited by
                                    #49

                                    @davidgerard @villares
                                    I dislike iocaine because it blocks niche browsers and you cannot even contact anyone to complain (no contact data to be found on the blocking site, so the invitation „please contact the owner of the site you're visiting“ is kind of insulting)

                                    davidgerard@circumstances.runD 1 Reply Last reply
                                    0
                                    • hierkiosk@social.tchncs.deH hierkiosk@social.tchncs.de

                                      @davidgerard @villares
                                      I dislike iocaine because it blocks niche browsers and you cannot even contact anyone to complain (no contact data to be found on the blocking site, so the invitation „please contact the owner of the site you're visiting“ is kind of insulting)

                                      davidgerard@circumstances.runD This user is from outside of this forum
                                      davidgerard@circumstances.runD This user is from outside of this forum
                                      davidgerard@circumstances.run
                                      wrote last edited by
                                      #50

                                      @hierkiosk @villares unfortunately, it works real good. contactability is a question for the site being piled under with shit.

                                      hierkiosk@social.tchncs.deH 1 Reply Last reply
                                      0
                                      • davidgerard@circumstances.runD davidgerard@circumstances.run

                                        @hierkiosk @villares unfortunately, it works real good. contactability is a question for the site being piled under with shit.

                                        hierkiosk@social.tchncs.deH This user is from outside of this forum
                                        hierkiosk@social.tchncs.deH This user is from outside of this forum
                                        hierkiosk@social.tchncs.de
                                        wrote last edited by
                                        #51

                                        @davidgerard

                                        „it works real good“ — what does this mean? Zero false-positives, zero false-negatives?

                                        davidgerard@circumstances.runD 1 Reply Last reply
                                        0
                                        • hierkiosk@social.tchncs.deH hierkiosk@social.tchncs.de

                                          @davidgerard

                                          „it works real good“ — what does this mean? Zero false-positives, zero false-negatives?

                                          davidgerard@circumstances.runD This user is from outside of this forum
                                          davidgerard@circumstances.runD This user is from outside of this forum
                                          davidgerard@circumstances.run
                                          wrote last edited by
                                          #52

                                          @hierkiosk i think if you apply some thought you'll work it out

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups