Bluesky is down today.

aeris@firefish.imirhil.fr

@mcc@mastodon.social @jeromechoo@masto.ai Each post is a dangling request which will consume 3s of CPU time and so 10× consumption of 300ms for alive server, and planned for reschedule. After a while, all workers are just stuck with full of 3s waiting process, with starvation for alive requests.

mcc@mastodon.social

Updates

- Over the last two hours the problem has gone from "I don't see my posts" to "I see my posts 1 hour after I make them" to "17 minutes" to "3 minutes" to "it's fixed". I interpret this as the relay firehose pointer, whatever relay is in use right now, gradually catching up.

- I need to stress the above thread is a mix of fact (ATProto federation is duplicative and often brittle) and conjecture (I can't know what relay is being used internally by Blacksky except if Rudy tells us).

mcc@mastodon.social

@breizh As of this second, Blacksky has resolved the issue. I don't know how.

aeris@firefish.imirhil.fr

@mcc@mastodon.social @jeromechoo@masto.ai After a while, you have 43 minutes latency for EVERY DELIVERY, even alive server. I experience that on my own Mastodon instance…

scatty_hannah@federation.network

@mcc@mastodon.social @aeris@firefish.imirhil.fr if that's really the case, if anything, that's an implementation problem. Mail servers have dealt with this problem for ages. That's why they have queues and per server exponentially increasing retry intervals. Push is not inherently bad.

slothrop@chaos.social

@nasser @mcc Thanks indeed! This is a great explanation.

My own takeaway is that Bluesky is a lost cause in terms of decentralization, because its architecture is designed to resist that outcome.

jeromechoo@masto.ai

@aeris @mcc like I have already said — every post you’ve made just now has failed to deliver to several instances. Your instance is running just fine is it not?

If mastodon.social goes down. It would add ONE more failed delivery to the queue of thousands your instance is already managing.

aeris@firefish.imirhil.fr

@mcc@mastodon.social @jeromechoo@masto.ai At the end any workers just have 7% of "luck" (3 out of 42) to hit a down request, consuming resource for nothing for 2-3s, having no more time to schedule alive server, with 13.000 pending request because starvation, with many many alive request in those 13.000. Perhaps the 13.000th will be a alive one, but it will be delivered in only 43 minutes in average.

aeris@firefish.imirhil.fr

@jeromechoo@masto.ai @mcc@mastodon.social No, it not running fine. I ALREADY reported 43 minutes latency to deliver ANY MESSAGE on Mastodon. This "bug" (in fact bad design) is known since ages.
https://github.com/mastodon/mastodon/issues/12445

aeris@firefish.imirhil.fr

@mcc@mastodon.social @jeromechoo@masto.ai And my new instance (migrating from Mastodon to Misskey exactly for this reason) is ALREADY filled with 600 dangling requests. At this point it doesn't generate any noticable delay, but only because the overall death rate is low. If a huge instance goes down, it would not be the same at all.

jeromechoo@masto.ai

@aeris @mcc and yet, despite the 43 min latency you’re reporting, we’ve been having a perfectly synchronous conversation for the last 15 minutes.

ale@social.manalejandro.com

@mcc the worst are distributed p2p attacks like i watch in ipfs.

CIRCLE WITH A DOT

Bluesky is down today.