Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. The bright #LLM future, next part.

The bright #LLM future, next part.

Scheduled Pinned Locked Moved Uncategorized
llmgentoonoainollm
60 Posts 29 Posters 182 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • davidgerard@circumstances.runD davidgerard@circumstances.run

    @hierkiosk @villares unfortunately, it works real good. contactability is a question for the site being piled under with shit.

    hierkiosk@social.tchncs.deH This user is from outside of this forum
    hierkiosk@social.tchncs.deH This user is from outside of this forum
    hierkiosk@social.tchncs.de
    wrote last edited by
    #51

    @davidgerard

    „it works real good“ — what does this mean? Zero false-positives, zero false-negatives?

    davidgerard@circumstances.runD 1 Reply Last reply
    0
    • hierkiosk@social.tchncs.deH hierkiosk@social.tchncs.de

      @davidgerard

      „it works real good“ — what does this mean? Zero false-positives, zero false-negatives?

      davidgerard@circumstances.runD This user is from outside of this forum
      davidgerard@circumstances.runD This user is from outside of this forum
      davidgerard@circumstances.run
      wrote last edited by
      #52

      @hierkiosk i think if you apply some thought you'll work it out

      1 Reply Last reply
      0
      • davidgerard@circumstances.runD davidgerard@circumstances.run

        @mgorny iocaine 3 works against this ok

        watch out for false positives

        keithzg@fediverse.keithzg.caK This user is from outside of this forum
        keithzg@fediverse.keithzg.caK This user is from outside of this forum
        keithzg@fediverse.keithzg.ca
        wrote last edited by
        #53
        @davidgerard @mgorny I keep teetering on the edge of setting up iocaine for my own code/issues/etc self-hosting using Phorge, which keeps getting hammered into OOM-death by LLM scrapers, and I really *should*, but it just feels so darned depressing
        1 Reply Last reply
        0
        • algernon@come-from.mad-scientist.clubA algernon@come-from.mad-scientist.club

          @mgorny

          How can we distinguish a legitimate user who hit some URL from a scraper that distributes its operations over thousands of IP addresses?

          Three ifs in a trenchcoat will get rid of the majority of those, without any additional software. The crawlers may appear complicated to defeat if you look at the user-agent only, but as soon as you look at some other headers, it turns out they're really, really, really dumb.

          If you want to do more than that, and do it slightly more efficiently than a reverse proxy can, iocaine can help.

          Unlike Anubis, the crawlers throwing more compute on it will not get past it, and legit visitors will (usually) remain unaware of its existence. It's in front of my own forge, happily serves ~800 req/sec (where the bottleneck is Caddy & TLS) on a €5/month potato quality VPS. It can also firewall IPs off, to further reduce load.

          It does catch some "legit" crawlers like Googlebot and Bingbot, but you can allow-list those, or keep them blocked because both of those feed into LLM training too.

          keithzg@fediverse.keithzg.caK This user is from outside of this forum
          keithzg@fediverse.keithzg.caK This user is from outside of this forum
          keithzg@fediverse.keithzg.ca
          wrote last edited by
          #54

          @algernon@come-from.mad-scientist.club At present for my own personal purposes I haven't yet set up iocaine, despite getting somewhat exasperated with outages caused by LLM scraping, due to

          1. I'm kinda lacklustre with proxy config stuff and there's no guide you have up for Apache2 (I'm sure I could muddle through some setup eventually though if I put actual effort into it)
          2. Though it's better about this not happening than Anubis, I still seem to get my main current browser of choice (Falkon) caught by Iocaine, enough so that in fact currently I cannot read the documentation as pages like just https://iocaine.madhouse-project.org/documentation/ sit loading forever even when I begrudgingly switch back to Chrome (so I assume either I've now been blocked, or your server has been hammered coincidentally). From https://chronicles.mad-scientist.club/tales/surviving-the-crawlers/#three-ifs-in-a-trenchcoat I got for example x-request-id: 0ia4BzigBrtETmjH0S2nd so, as per the text there at the bottom after the AI-aimed gobbleygook, I am telling you 😉
          keithzg@fediverse.keithzg.caK 1 Reply Last reply
          0
          • keithzg@fediverse.keithzg.caK keithzg@fediverse.keithzg.ca

            @algernon@come-from.mad-scientist.club At present for my own personal purposes I haven't yet set up iocaine, despite getting somewhat exasperated with outages caused by LLM scraping, due to

            1. I'm kinda lacklustre with proxy config stuff and there's no guide you have up for Apache2 (I'm sure I could muddle through some setup eventually though if I put actual effort into it)
            2. Though it's better about this not happening than Anubis, I still seem to get my main current browser of choice (Falkon) caught by Iocaine, enough so that in fact currently I cannot read the documentation as pages like just https://iocaine.madhouse-project.org/documentation/ sit loading forever even when I begrudgingly switch back to Chrome (so I assume either I've now been blocked, or your server has been hammered coincidentally). From https://chronicles.mad-scientist.club/tales/surviving-the-crawlers/#three-ifs-in-a-trenchcoat I got for example x-request-id: 0ia4BzigBrtETmjH0S2nd so, as per the text there at the bottom after the AI-aimed gobbleygook, I am telling you 😉
            keithzg@fediverse.keithzg.caK This user is from outside of this forum
            keithzg@fediverse.keithzg.caK This user is from outside of this forum
            keithzg@fediverse.keithzg.ca
            wrote last edited by
            #55

            @algernon@come-from.mad-scientist.club The outright pageload failures look to have been Very User Error on my part, but I can further report for example x-request-id: NLCTLT-0WB7P2fWsXx1bH from my trying to load https://iocaine.madhouse-project.org/documentation/3/reverse-proxies/ with Falkon here still on Kubuntu 24.04 (and thus Falkon 24.01.75 using QtWebEngine 5.15.16, which is admittedly all pretty outdated in our fast-paced world).

            algernon@come-from.mad-scientist.clubA 1 Reply Last reply
            0
            • keithzg@fediverse.keithzg.caK keithzg@fediverse.keithzg.ca

              @algernon@come-from.mad-scientist.club The outright pageload failures look to have been Very User Error on my part, but I can further report for example x-request-id: NLCTLT-0WB7P2fWsXx1bH from my trying to load https://iocaine.madhouse-project.org/documentation/3/reverse-proxies/ with Falkon here still on Kubuntu 24.04 (and thus Falkon 24.01.75 using QtWebEngine 5.15.16, which is admittedly all pretty outdated in our fast-paced world).

              algernon@come-from.mad-scientist.clubA This user is from outside of this forum
              algernon@come-from.mad-scientist.clubA This user is from outside of this forum
              algernon@come-from.mad-scientist.club
              wrote last edited by
              #56

              @keithzg Found it in my logs, will apply a workaround shortly1 that'll let Falkon pass.


              1. Probably in an hour or two - a bit out of spoons at the moment, sorry ↩︎

              keithzg@fediverse.keithzg.caK 1 Reply Last reply
              0
              • algernon@come-from.mad-scientist.clubA algernon@come-from.mad-scientist.club

                @keithzg Found it in my logs, will apply a workaround shortly1 that'll let Falkon pass.


                1. Probably in an hour or two - a bit out of spoons at the moment, sorry ↩︎

                keithzg@fediverse.keithzg.caK This user is from outside of this forum
                keithzg@fediverse.keithzg.caK This user is from outside of this forum
                keithzg@fediverse.keithzg.ca
                wrote last edited by
                #57
                @algernon Appreciated, and boy howdy do I know that spoonless feeling
                1 Reply Last reply
                0
                • alien@fosstodon.orgA alien@fosstodon.org

                  @mgorny it does not help pointing to people using LLMs for legitimate reasons. It's other people using those same tools but then for nefarious purposes.
                  I use user-agent filtering and put Anubis in front of the Slackware git infrastructure, and that has helped immensely.
                  I eventually got git.gentoo.org to render and gosh! That's a lot of repositories there. Would it be an idea to distribute the cgit interface over multiple front-end servers? Like, moving all user repos to a different server?

                  epic_null@infosec.exchangeE This user is from outside of this forum
                  epic_null@infosec.exchangeE This user is from outside of this forum
                  epic_null@infosec.exchange
                  wrote last edited by
                  #58

                  @alien @mgorny If you are using one for legitimate reasons... maybe turn to them and ask what steps they are taking to prevent their product from being used to DDOS FLOSS projects.

                  1 Reply Last reply
                  0
                  • mirabilos@toot.mirbsd.orgM mirabilos@toot.mirbsd.org

                    @js @mgorny yeah, someone found slop agent instructions in their repo some days ago.

                    epic_null@infosec.exchangeE This user is from outside of this forum
                    epic_null@infosec.exchangeE This user is from outside of this forum
                    epic_null@infosec.exchange
                    wrote last edited by
                    #59

                    @mirabilos @js @mgorny Real instructions or trap instructions?

                    mirabilos@toot.mirbsd.orgM 1 Reply Last reply
                    0
                    • epic_null@infosec.exchangeE epic_null@infosec.exchange

                      @mirabilos @js @mgorny Real instructions or trap instructions?

                      mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                      mirabilos@toot.mirbsd.orgM This user is from outside of this forum
                      mirabilos@toot.mirbsd.org
                      wrote last edited by
                      #60

                      @Epic_Null @js @mgorny you’d have to ask them or look for yourself

                      1 Reply Last reply
                      0
                      • R relay@relay.infosec.exchange shared this topic
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups