Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Technical Discussion
  3. Scaling: ActivityPub over NNTP?

Scaling: ActivityPub over NNTP?

Scheduled Pinned Locked Moved Technical Discussion
fediverseactivitypubusenet
7 Posts 4 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • rimu@piefed.socialR This user is from outside of this forum
    rimu@piefed.socialR This user is from outside of this forum
    rimu@piefed.social
    wrote last edited by
    #1

    Here's an interesting thought experiment.

    Way back in the 1980s and 90s, Usenet was a sorta-federated discussion forum (using the NNTP protocol) that was very popular. It still exists and is distributing 400 million messages each day (mostly spam and trash as far as I can tell). Hard numbers are difficult to come by but it seems like Usenet is capable of significantly higher throughput. Why is that?

    The big thing holding ActivityPub back is the fan-out. You know the story - someone with 50,000 followers causes their instance to send up to 50,000 HTTP POSTs every time they click the little spinny star or reply to something.

    It's basically a hub-and-spoke network topology. Except everyone takes turns being the hub, ideally, but not much in practice. And in this topology, the hubs are where the strain and bottlenecks are.

    Back in the 1980s they had computers literally 1000 times slower than ours and network links to match. So how did they do this? With a peer to peer network topology! When a new post is made, they don't send it to everyone they just send it to a handful of other servers. Those servers in turn forward the post on to a handful of other peers, and so on, until the whole network receives the post. No individual server is a single point of failure and none has to bear the full brunt of orchestrating it all.

    Let's do a picture. A creates a post and sends it to B and D.

    A ─ B ─ C  
     \      /  
       ─ D ─  
    

    B sends it on to C.

    Meanwhile D sends it on the C also but C already has it so does nothing more. IRL this would be a much larger mesh. Who peers with who can be a mixture of manual selection and random spiciness.

    Posts can arrive out of order so each server would need to wait until the dependencies between posts are resolved before making them available to clients. That's a bit tricky.

    In the ActivityPub-over-NNTP idea, each NNTP post would be a thin wrapper around a data structure containing the HTTP headers (with signature and digest) and JSON that a normal HTTP POSTed Activity would have. Servers would use NNTP to distribute the activities and upon receiving one they'd POST it to their own /inbox to run the usual ActivityPub processing that their AP instance does.

    {  
      "headers": {  
        "Signature": "...",  
        "Digest": "...",  
        "Date": "..."  
      },  
      "activity": { ... normal ActivityPub JSON ... }  
    }  
    

    In this way there is no need to rewrite ActivityPub semantics as only the transport layer changes. Our existing inbox logic remains intact.

    NNTP comes with a lot of historical baggage so we'd probably need to evolve the protocol a bit. Maybe use HTTP requests (even http2 streams?) instead of the original line-oriented text protocol using raw TCP sockets. But you get the idea.

    Thoughts?

    asudox@lemmy.asudox.devA ineedmana@piefed.zipI ? 3 Replies Last reply
    1
    0
    • R relay@relay.mycrowd.ca shared this topic
    • rimu@piefed.socialR rimu@piefed.social

      Here's an interesting thought experiment.

      Way back in the 1980s and 90s, Usenet was a sorta-federated discussion forum (using the NNTP protocol) that was very popular. It still exists and is distributing 400 million messages each day (mostly spam and trash as far as I can tell). Hard numbers are difficult to come by but it seems like Usenet is capable of significantly higher throughput. Why is that?

      The big thing holding ActivityPub back is the fan-out. You know the story - someone with 50,000 followers causes their instance to send up to 50,000 HTTP POSTs every time they click the little spinny star or reply to something.

      It's basically a hub-and-spoke network topology. Except everyone takes turns being the hub, ideally, but not much in practice. And in this topology, the hubs are where the strain and bottlenecks are.

      Back in the 1980s they had computers literally 1000 times slower than ours and network links to match. So how did they do this? With a peer to peer network topology! When a new post is made, they don't send it to everyone they just send it to a handful of other servers. Those servers in turn forward the post on to a handful of other peers, and so on, until the whole network receives the post. No individual server is a single point of failure and none has to bear the full brunt of orchestrating it all.

      Let's do a picture. A creates a post and sends it to B and D.

      A ─ B ─ C  
       \      /  
         ─ D ─  
      

      B sends it on to C.

      Meanwhile D sends it on the C also but C already has it so does nothing more. IRL this would be a much larger mesh. Who peers with who can be a mixture of manual selection and random spiciness.

      Posts can arrive out of order so each server would need to wait until the dependencies between posts are resolved before making them available to clients. That's a bit tricky.

      In the ActivityPub-over-NNTP idea, each NNTP post would be a thin wrapper around a data structure containing the HTTP headers (with signature and digest) and JSON that a normal HTTP POSTed Activity would have. Servers would use NNTP to distribute the activities and upon receiving one they'd POST it to their own /inbox to run the usual ActivityPub processing that their AP instance does.

      {  
        "headers": {  
          "Signature": "...",  
          "Digest": "...",  
          "Date": "..."  
        },  
        "activity": { ... normal ActivityPub JSON ... }  
      }  
      

      In this way there is no need to rewrite ActivityPub semantics as only the transport layer changes. Our existing inbox logic remains intact.

      NNTP comes with a lot of historical baggage so we'd probably need to evolve the protocol a bit. Maybe use HTTP requests (even http2 streams?) instead of the original line-oriented text protocol using raw TCP sockets. But you get the idea.

      Thoughts?

      asudox@lemmy.asudox.devA This user is from outside of this forum
      asudox@lemmy.asudox.devA This user is from outside of this forum
      asudox@lemmy.asudox.dev
      wrote last edited by
      #2

      So how would defederation work in this case?

      ineedmana@piefed.zipI rimu@piefed.socialR 2 Replies Last reply
      0
      • asudox@lemmy.asudox.devA asudox@lemmy.asudox.dev

        So how would defederation work in this case?

        ineedmana@piefed.zipI This user is from outside of this forum
        ineedmana@piefed.zipI This user is from outside of this forum
        ineedmana@piefed.zip
        wrote last edited by
        #3

        I'm just a tourist here, but here's my take:
        If C can recognize that the post it just got from D is the same as the one from B, then there should be a way for it to recognize it comes from A and ignore entirely

        Or, even better, follow the federation model: subscribe to D's and B's pass-through from E and F, leaving out A. So D and B won't send out unwanted posts at all

        1 Reply Last reply
        0
        • rimu@piefed.socialR rimu@piefed.social

          Here's an interesting thought experiment.

          Way back in the 1980s and 90s, Usenet was a sorta-federated discussion forum (using the NNTP protocol) that was very popular. It still exists and is distributing 400 million messages each day (mostly spam and trash as far as I can tell). Hard numbers are difficult to come by but it seems like Usenet is capable of significantly higher throughput. Why is that?

          The big thing holding ActivityPub back is the fan-out. You know the story - someone with 50,000 followers causes their instance to send up to 50,000 HTTP POSTs every time they click the little spinny star or reply to something.

          It's basically a hub-and-spoke network topology. Except everyone takes turns being the hub, ideally, but not much in practice. And in this topology, the hubs are where the strain and bottlenecks are.

          Back in the 1980s they had computers literally 1000 times slower than ours and network links to match. So how did they do this? With a peer to peer network topology! When a new post is made, they don't send it to everyone they just send it to a handful of other servers. Those servers in turn forward the post on to a handful of other peers, and so on, until the whole network receives the post. No individual server is a single point of failure and none has to bear the full brunt of orchestrating it all.

          Let's do a picture. A creates a post and sends it to B and D.

          A ─ B ─ C  
           \      /  
             ─ D ─  
          

          B sends it on to C.

          Meanwhile D sends it on the C also but C already has it so does nothing more. IRL this would be a much larger mesh. Who peers with who can be a mixture of manual selection and random spiciness.

          Posts can arrive out of order so each server would need to wait until the dependencies between posts are resolved before making them available to clients. That's a bit tricky.

          In the ActivityPub-over-NNTP idea, each NNTP post would be a thin wrapper around a data structure containing the HTTP headers (with signature and digest) and JSON that a normal HTTP POSTed Activity would have. Servers would use NNTP to distribute the activities and upon receiving one they'd POST it to their own /inbox to run the usual ActivityPub processing that their AP instance does.

          {  
            "headers": {  
              "Signature": "...",  
              "Digest": "...",  
              "Date": "..."  
            },  
            "activity": { ... normal ActivityPub JSON ... }  
          }  
          

          In this way there is no need to rewrite ActivityPub semantics as only the transport layer changes. Our existing inbox logic remains intact.

          NNTP comes with a lot of historical baggage so we'd probably need to evolve the protocol a bit. Maybe use HTTP requests (even http2 streams?) instead of the original line-oriented text protocol using raw TCP sockets. But you get the idea.

          Thoughts?

          ineedmana@piefed.zipI This user is from outside of this forum
          ineedmana@piefed.zipI This user is from outside of this forum
          ineedmana@piefed.zip
          wrote last edited by
          #4

          There should be some routing in the protocol. So A sending out to whole alphabet won't take down C by multiplied burst from all the other letters

          rimu@piefed.socialR 1 Reply Last reply
          0
          • rimu@piefed.socialR rimu@piefed.social

            Here's an interesting thought experiment.

            Way back in the 1980s and 90s, Usenet was a sorta-federated discussion forum (using the NNTP protocol) that was very popular. It still exists and is distributing 400 million messages each day (mostly spam and trash as far as I can tell). Hard numbers are difficult to come by but it seems like Usenet is capable of significantly higher throughput. Why is that?

            The big thing holding ActivityPub back is the fan-out. You know the story - someone with 50,000 followers causes their instance to send up to 50,000 HTTP POSTs every time they click the little spinny star or reply to something.

            It's basically a hub-and-spoke network topology. Except everyone takes turns being the hub, ideally, but not much in practice. And in this topology, the hubs are where the strain and bottlenecks are.

            Back in the 1980s they had computers literally 1000 times slower than ours and network links to match. So how did they do this? With a peer to peer network topology! When a new post is made, they don't send it to everyone they just send it to a handful of other servers. Those servers in turn forward the post on to a handful of other peers, and so on, until the whole network receives the post. No individual server is a single point of failure and none has to bear the full brunt of orchestrating it all.

            Let's do a picture. A creates a post and sends it to B and D.

            A ─ B ─ C  
             \      /  
               ─ D ─  
            

            B sends it on to C.

            Meanwhile D sends it on the C also but C already has it so does nothing more. IRL this would be a much larger mesh. Who peers with who can be a mixture of manual selection and random spiciness.

            Posts can arrive out of order so each server would need to wait until the dependencies between posts are resolved before making them available to clients. That's a bit tricky.

            In the ActivityPub-over-NNTP idea, each NNTP post would be a thin wrapper around a data structure containing the HTTP headers (with signature and digest) and JSON that a normal HTTP POSTed Activity would have. Servers would use NNTP to distribute the activities and upon receiving one they'd POST it to their own /inbox to run the usual ActivityPub processing that their AP instance does.

            {  
              "headers": {  
                "Signature": "...",  
                "Digest": "...",  
                "Date": "..."  
              },  
              "activity": { ... normal ActivityPub JSON ... }  
            }  
            

            In this way there is no need to rewrite ActivityPub semantics as only the transport layer changes. Our existing inbox logic remains intact.

            NNTP comes with a lot of historical baggage so we'd probably need to evolve the protocol a bit. Maybe use HTTP requests (even http2 streams?) instead of the original line-oriented text protocol using raw TCP sockets. But you get the idea.

            Thoughts?

            ? Offline
            ? Offline
            Guest
            wrote last edited by
            #5

            each NNTP post would be a thin wrapper around a data structure containing the HTTP headers (with signature and digest) and JSON that a normal HTTP POSTed Activity would have

            Signature can also be embedded within an activity: https://codeberg.org/fediverse/fep/src/branch/main/fep/8b32/fep-8b32.md

            1 Reply Last reply
            1
            0
            • asudox@lemmy.asudox.devA asudox@lemmy.asudox.dev

              So how would defederation work in this case?

              rimu@piefed.socialR This user is from outside of this forum
              rimu@piefed.socialR This user is from outside of this forum
              rimu@piefed.social
              wrote last edited by
              #6

              Defederation has two parts - blocking the receiving of posts and stopping sending of posts.

              Blocking receiving is easy:

              Each NNTP post includes the path it traveled on in the headers - each time a server forwards it on it appends it's name to that path. Recepients could check the first element in the path to see the origin. Or, discard the post once it arrives in it's /inbox, in the usual way.

              Stopping sending is trickier. I guess you'd need to include a 'do not send to' list in the post and hope that all servers honor that.

              1 Reply Last reply
              1
              0
              • ineedmana@piefed.zipI ineedmana@piefed.zip

                There should be some routing in the protocol. So A sending out to whole alphabet won't take down C by multiplied burst from all the other letters

                rimu@piefed.socialR This user is from outside of this forum
                rimu@piefed.socialR This user is from outside of this forum
                rimu@piefed.social
                wrote last edited by
                #7

                Yes, I think that's part of NNTP already. Each post has a list of the servers it has traveled through so when considering where to forward the post on to a server can check if it's already been there. That would help somewhat but still there would be quite a few times when a server discards posts.

                I haven't gotten deep enough into this yet but I'm sure there have been protocol improvements since NNTP that address this. Gossip protocols have been experimented with since the early 2000s. For example, rather than servers saying to others "I have this post, do you want it?" they might say "the most recent post I have in the fediverse@lemmy.world community is #5" and another server which only has posts #1 and #2 would respond "cool, give me posts #3, #4 and #5".

                1 Reply Last reply
                1
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups