Scaling: ActivityPub over NNTP?

rimu@piefed.social

Here's an interesting thought experiment.

Way back in the 1980s and 90s, Usenet was a sorta-federated discussion forum (using the NNTP protocol) that was very popular. It still exists and is distributing 400 million messages each day (mostly spam and trash as far as I can tell). Hard numbers are difficult to come by but it seems like Usenet is capable of significantly higher throughput. Why is that?

The big thing holding ActivityPub back is the fan-out. You know the story - someone with 50,000 followers causes their instance to send up to 50,000 HTTP POSTs every time they click the little spinny star or reply to something.

It's basically a hub-and-spoke network topology. Except everyone takes turns being the hub, ideally, but not much in practice. And in this topology, the hubs are where the strain and bottlenecks are.

Back in the 1980s they had computers literally 1000 times slower than ours and network links to match. So how did they do this? With a peer to peer network topology! When a new post is made, they don't send it to everyone they just send it to a handful of other servers. Those servers in turn forward the post on to a handful of other peers, and so on, until the whole network receives the post. No individual server is a single point of failure and none has to bear the full brunt of orchestrating it all.

Let's do a picture. A creates a post and sends it to B and D.

A ─ B ─ C  
 \      /  
   ─ D ─

B sends it on to C.

Meanwhile D sends it on the C also but C already has it so does nothing more. IRL this would be a much larger mesh. Who peers with who can be a mixture of manual selection and random spiciness.

Posts can arrive out of order so each server would need to wait until the dependencies between posts are resolved before making them available to clients. That's a bit tricky.

In the ActivityPub-over-NNTP idea, each NNTP post would be a thin wrapper around a data structure containing the HTTP headers (with signature and digest) and JSON that a normal HTTP POSTed Activity would have. Servers would use NNTP to distribute the activities and upon receiving one they'd POST it to their own /inbox to run the usual ActivityPub processing that their AP instance does.

{  
  "headers": {  
    "Signature": "...",  
    "Digest": "...",  
    "Date": "..."  
  },  
  "activity": { ... normal ActivityPub JSON ... }  
}

In this way there is no need to rewrite ActivityPub semantics as only the transport layer changes. Our existing inbox logic remains intact.

NNTP comes with a lot of historical baggage so we'd probably need to evolve the protocol a bit. Maybe use HTTP requests (even http2 streams?) instead of the original line-oriented text protocol using raw TCP sockets. But you get the idea.

Thoughts?

asudox@lemmy.asudox.dev

So how would defederation work in this case?

ineedmana@piefed.zip

I'm just a tourist here, but here's my take:
If C can recognize that the post it just got from D is the same as the one from B, then there should be a way for it to recognize it comes from A and ignore entirely

Or, even better, follow the federation model: subscribe to D's and B's pass-through from E and F, leaving out A. So D and B won't send out unwanted posts at all

ineedmana@piefed.zip

There should be some routing in the protocol. So A sending out to whole alphabet won't take down C by multiplied burst from all the other letters

? Offline

each NNTP post would be a thin wrapper around a data structure containing the HTTP headers (with signature and digest) and JSON that a normal HTTP POSTed Activity would have

Signature can also be embedded within an activity: https://codeberg.org/fediverse/fep/src/branch/main/fep/8b32/fep-8b32.md

rimu@piefed.social

Defederation has two parts - blocking the receiving of posts and stopping sending of posts.

Blocking receiving is easy:

Each NNTP post includes the path it traveled on in the headers - each time a server forwards it on it appends it's name to that path. Recepients could check the first element in the path to see the origin. Or, discard the post once it arrives in it's /inbox, in the usual way.

Stopping sending is trickier. I guess you'd need to include a 'do not send to' list in the post and hope that all servers honor that.

rimu@piefed.social

Yes, I think that's part of NNTP already. Each post has a list of the servers it has traveled through so when considering where to forward the post on to a server can check if it's already been there. That would help somewhat but still there would be quite a few times when a server discards posts.

I haven't gotten deep enough into this yet but I'm sure there have been protocol improvements since NNTP that address this. Gossip protocols have been experimented with since the early 2000s. For example, rather than servers saying to others "I have this post, do you want it?" they might say "the most recent post I have in the fediverse@lemmy.world community is #5" and another server which only has posts #1 and #2 would respond "cool, give me posts #3, #4 and #5".

CIRCLE WITH A DOT

Scaling: ActivityPub over NNTP?