sure this is all very bad for activitypub but this is truly amazing content
-
@trishalynn If the user is online when the data is received, there may be no time between the time the data is received and when the user reads it.
However, most users aren't online most of the time. There's a strong chance that there are minutes, hours, or days between when the data is received and when it is read.
@trishalynn Most ActivityPub implementations today lean waaaaaay into the early part of this gap -- verifying the data as soon as it is received.
The problem with this is that sometimes hundreds or even thousands of servers receive the data within a few seconds -- and if they all verify the data with the third-party server immediately, it can swamp that server with requests.
-
@evan I should think that the server should verify first even if the user is not active online.
@trishalynn Before it's read by a user, yes.
-
@trishalynn Most ActivityPub implementations today lean waaaaaay into the early part of this gap -- verifying the data as soon as it is received.
The problem with this is that sometimes hundreds or even thousands of servers receive the data within a few seconds -- and if they all verify the data with the third-party server immediately, it can swamp that server with requests.
@trishalynn One way to relieve this pressure on the third party server is to space out all these requests by seconds or even minutes. There are a couple of ways to do this.
-
@trishalynn One way to relieve this pressure on the third party server is to space out all these requests by seconds or even minutes. There are a couple of ways to do this.
@trishalynn One is to wait until the first reader reads the data. That event is going to vary wildly across servers, so it will spread out the requests and lower the load on the third-party server. The downside of this technique is that it introduces some extra time for that first read. Usually not a lot, but some.
-
@trishalynn One is to wait until the first reader reads the data. That event is going to vary wildly across servers, so it will spread out the requests and lower the load on the third-party server. The downside of this technique is that it introduces some extra time for that first read. Usually not a lot, but some.
@trishalynn Another is for the receiving server to wait a random number of seconds or minutes before doing the verification request. This spaces out the requests, and hopefully avoids the little delay for the user on first read. At worst, if a user tries to read the data before the verification timeout, you can do the verification then -- it's no worse than the previous method, and will usually be better.
-
@trishalynn One is to wait until the first reader reads the data. That event is going to vary wildly across servers, so it will spread out the requests and lower the load on the third-party server. The downside of this technique is that it introduces some extra time for that first read. Usually not a lot, but some.
@evan (Could you please let me know when you’re done explaining? I don’t want to jump in with clarifying Qs till you’re done.)
-
@trishalynn Another is for the receiving server to wait a random number of seconds or minutes before doing the verification request. This spaces out the requests, and hopefully avoids the little delay for the user on first read. At worst, if a user tries to read the data before the verification timeout, you can do the verification then -- it's no worse than the previous method, and will usually be better.
@trishalynn So, the last part, which I think is most controversial, is showing the unverified data to the user -- doing the verification *after* the first read.
This requires a lot of trust between the actors. But if a sending actor has sent 10 or 1000 or 10,000 shares, all of which have previously verified correctly, there's a very good chance that share number 10001 is also going to verify correctly.
-
@trishalynn So, the last part, which I think is most controversial, is showing the unverified data to the user -- doing the verification *after* the first read.
This requires a lot of trust between the actors. But if a sending actor has sent 10 or 1000 or 10,000 shares, all of which have previously verified correctly, there's a very good chance that share number 10001 is also going to verify correctly.
@trishalynn This requires a lot more tracking on the receiving server's part. I'm not even sure the performance benefits are that great, compared to waiting for first-read instead of verifying on receipt. But for high-volume servers, it might be a valuable strategy in the future.
-
@evan (Could you please let me know when you’re done explaining? I don’t want to jump in with clarifying Qs till you’re done.)
@trishalynn I think I'm done!
-
@trishalynn This requires a lot more tracking on the receiving server's part. I'm not even sure the performance benefits are that great, compared to waiting for first-read instead of verifying on receipt. But for high-volume servers, it might be a valuable strategy in the future.
@evan What's the effect on a high-volume server versus a lower-volume server when the ethos of "trust, then verify" is used to implement a solution?
-
@evan @anders @promovicz @laurenshof It doesn't need to break backwards compatibility tho
But anyway
Long conversation potentially
@cwebber The original conversation was about removing JSON-LD and potentially using another schema language or making one up, or throwing away extensibility altogether. That would break backwards compatibility.
I agree, we might be able to add digital signatures without removing JSON-LD.
-
@evan What's the effect on a high-volume server versus a lower-volume server when the ethos of "trust, then verify" is used to implement a solution?
@trishalynn OK, so, you're good with the idea that the data doesn't have to be verified until the first user reads it, correct? We're good up until there?
-
@trishalynn OK, so, you're good with the idea that the data doesn't have to be verified until the first user reads it, correct? We're good up until there?
@trishalynn Most of the benefits happen there. It would be great to see more ActivityPub implementations take that approach, because it would ease up on smaller servers. (Christine gave the example of when she shares posts by her friend Viv, which kills Viv's server.)
-
@trishalynn Most of the benefits happen there. It would be great to see more ActivityPub implementations take that approach, because it would ease up on smaller servers. (Christine gave the example of when she shares posts by her friend Viv, which kills Viv's server.)
@trishalynn I think that maintaining trust metrics has some resource requirements -- you have to track by server and maybe by actor how many times you've received third-party data from them, and how many times it has verified correctly.
-
@trishalynn I think that maintaining trust metrics has some resource requirements -- you have to track by server and maybe by actor how many times you've received third-party data from them, and how many times it has verified correctly.
@trishalynn I think there are limited benefits to using these trust metrics to verify even *after* the first read. So, it would only be on a server with a lot of scale, where those benefits multiply out over thousands or millions of interactions, where that technique might pay off.
-
@trishalynn I think there are limited benefits to using these trust metrics to verify even *after* the first read. So, it would only be on a server with a lot of scale, where those benefits multiply out over thousands or millions of interactions, where that technique might pay off.
@trishalynn I hope that answers your question.
-
@trishalynn I hope that answers your question.
@trishalynn Oh, I should probably say: trust is what we do when we are not certain. If I receive my 10 millionth share from mastodon.social, and I decide to delay verifying it, there's a non-zero chance that this is the time that mastodon.social takes its heel turn and sends me fake data. Trust is accepting that non-zero chance. For users or developers that can't accept that chance, waiting to verify when the first user reads the data is still a great benefit, and also a lot easier to code for.
-
Nah, I think you had the right to pop off a bit there. I'm no network engineer, but even I thought verifying upon first read was an insane take. In this age with agentic AI writing goddamn hit-pieces on people and how dangerous things are getting, security has to be a priority. Dis/misinformation is spreading at unprecedented rates, and I think a place like the decentralized web needs to do whatever it can to limit that spread if it wants to actually be a viable alternative/replacement.
This is a really interesting take!
To me, disinformation becomes dangerous when it is read by a user. Until then, it's just bits on a hard drive.
In your mind, what's the danger of having unverified data in a database that no user has yet read?
-
@cwebber @promovicz @laurenshof I don't feel like things got that bad at all.
I continue to believe that verifying content when it's first read, rather than when it's first received, is a much more performant strategy. It causes a slight hit for the first reader, but it spreads out the stress on the remote server across time much better.
I also think trust metrics are good for networks.
I did promise you a blog post on the topic, though, @cwebber . I'll try to get that done next week!
@evan @cwebber @promovicz @laurenshof How do you handle notifications for the purpose of determining when the content is first read? I receive notifications for my mentions, which include the contents of the message. There's no way for the server to know when I actually read the message in the notification, only when the notification is received by my client (which will likely be within seconds to minutes of it being received by my server).
The options are either to include unverified content in the notification (which I don't consider to be acceptable), or verify it first, at which point it's almost the same as verifying it as soon as it's received by my server. -
R relay@relay.infosec.exchange shared this topic