Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Bluesky is down today.

Bluesky is down today.

Scheduled Pinned Locked Moved Uncategorized
101 Posts 35 Posters 1 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • jeromechoo@masto.aiJ jeromechoo@masto.ai

    @aeris @mcc failed deliveries happen all the time. mastodon.social is just another instance. Since Mastodon and misskey operate on shared inboxes, the failed deliveries won’t scale sender failures by recipient instance size.

    https://seb.jambor.dev/posts/understanding-activitypub/#:~:text=To%20combat%20this,our%20followers%20internally.

    aeris@firefish.imirhil.frA This user is from outside of this forum
    aeris@firefish.imirhil.frA This user is from outside of this forum
    aeris@firefish.imirhil.fr
    wrote last edited by
    #83

    @jeromechoo@masto.ai @mcc@mastodon.social Yes, I know that. Trouble is not one content send to many, but many content sent to one.
    Each post of one instance is sent only once to mastodon.social, but
    EACH post.

    aeris@firefish.imirhil.frA 1 Reply Last reply
    0
    • System shared this topic
    • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

      @jeromechoo@masto.ai @mcc@mastodon.social Yes, I know that. Trouble is not one content send to many, but many content sent to one.
      Each post of one instance is sent only once to mastodon.social, but
      EACH post.

      aeris@firefish.imirhil.frA This user is from outside of this forum
      aeris@firefish.imirhil.frA This user is from outside of this forum
      aeris@firefish.imirhil.fr
      wrote last edited by
      #84

      @mcc@mastodon.social @jeromechoo@masto.ai So a huge instance sent dozen of post per second (many content generated, but delivered only one) to another huge instance, with one background job per content to deliver.

      aeris@firefish.imirhil.frA 1 Reply Last reply
      0
      • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

        @mcc@mastodon.social @jeromechoo@masto.ai So a huge instance sent dozen of post per second (many content generated, but delivered only one) to another huge instance, with one background job per content to deliver.

        aeris@firefish.imirhil.frA This user is from outside of this forum
        aeris@firefish.imirhil.frA This user is from outside of this forum
        aeris@firefish.imirhil.fr
        wrote last edited by
        #85

        @mcc@mastodon.social @jeromechoo@masto.ai The trouble scale not to the down instance size, but to the alive instance size. The more it is active with many content generated, the fastest the background job queue fill with dangling content.

        jeromechoo@masto.aiJ 1 Reply Last reply
        0
        • mcc@mastodon.socialM mcc@mastodon.social

          This appears to be the explanation:

          Link Preview Image
          Rudolph Fraser. (@rude1.blacksky.team)

          Even their relay seems down(?) Trying to switch some things to use atproto.africa https://atproto.africa

          favicon

          Blacksky (blacksky.community)

          In Bluesky, the PDS talks to the relay talks to the appview goes to the client. Blacksky set up all four last year. But they only deployed their PDS and client, at first. They used Bluesky's relay and appview. This wasn't clearly disclosed. Then there was a censorship scare, and they switched to their own appview. But apparently they're still using Bluesky's relay. This wasn't clearly disclosed. Now relay death kills Blacksky.

          timbray@cosocial.caT This user is from outside of this forum
          timbray@cosocial.caT This user is from outside of this forum
          timbray@cosocial.ca
          wrote last edited by
          #86

          @mcc When I initially raised my eyebrows at Bluesky's notion of "federation", I was told that anyone can run a relay on a small cheap computer, it's dead easy, etc.…

          1 Reply Last reply
          0
          • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

            @mcc@mastodon.social @jeromechoo@masto.ai The trouble scale not to the down instance size, but to the alive instance size. The more it is active with many content generated, the fastest the background job queue fill with dangling content.

            jeromechoo@masto.aiJ This user is from outside of this forum
            jeromechoo@masto.aiJ This user is from outside of this forum
            jeromechoo@masto.ai
            wrote last edited by
            #87

            @aeris @mcc this post you made probably just failed to deliver to a few instances. One of them could be mastodon.social if it was down.

            How does that affect delivery to masto.ai?

            mastodon.social being down is no different to your server than any other server being down. One request each.

            aeris@firefish.imirhil.frA 1 Reply Last reply
            0
            • mcc@mastodon.socialM mcc@mastodon.social

              Now, interestingly, this means that Blacksky users can continue talking to Blacksky users. I can read Rudy's posts on Blacksky. Because that bypasses the relay. But¹ to read my *own* posts, *on a self-hosted PDS*, Bluesky is apparently required, because Blacksky relies on Bluesky's "relay" to scrape my PDS before it gets added to the Blacksky appview database.

              ¹ (if I'm interpreting Rudy's posts correctly, hardly a guarantee)

              breizh@pleroma.breizh.pmB This user is from outside of this forum
              breizh@pleroma.breizh.pmB This user is from outside of this forum
              breizh@pleroma.breizh.pm
              wrote last edited by
              #88

              @mcc@mastodon.social From what I understand of the protocol, they could just stop using a relay at all, but then it would increase the traffic on all the PDS that were scrapped by the relay until then, since the AppView would have to connect to each of those instead of the relay.

              And did switching to another relay solved the issue?

              mcc@mastodon.socialM 1 Reply Last reply
              0
              • jeromechoo@masto.aiJ jeromechoo@masto.ai

                @aeris @mcc this post you made probably just failed to deliver to a few instances. One of them could be mastodon.social if it was down.

                How does that affect delivery to masto.ai?

                mastodon.social being down is no different to your server than any other server being down. One request each.

                aeris@firefish.imirhil.frA This user is from outside of this forum
                aeris@firefish.imirhil.frA This user is from outside of this forum
                aeris@firefish.imirhil.fr
                wrote last edited by
                #89

                @jeromechoo@masto.ai @mcc@mastodon.social It affect deliver to masto.ai because EACH of my post generate a dangling request, hiting timeout. After a while, my worker consume more time to dangling request taking 2-3s (hiting timeout) than trying to send content to masto.ai.

                aeris@firefish.imirhil.frA jeromechoo@masto.aiJ 2 Replies Last reply
                0
                • R relay@relay.publicsquare.global shared this topic
                • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                  @jeromechoo@masto.ai @mcc@mastodon.social It affect deliver to masto.ai because EACH of my post generate a dangling request, hiting timeout. After a while, my worker consume more time to dangling request taking 2-3s (hiting timeout) than trying to send content to masto.ai.

                  aeris@firefish.imirhil.frA This user is from outside of this forum
                  aeris@firefish.imirhil.frA This user is from outside of this forum
                  aeris@firefish.imirhil.fr
                  wrote last edited by
                  #90

                  @mcc@mastodon.social @jeromechoo@masto.ai Each post is a dangling request which will consume 3s of CPU time and so 10× consumption of 300ms for alive server, and planned for reschedule. After a while, all workers are just stuck with full of 3s waiting process, with starvation for alive requests.

                  aeris@firefish.imirhil.frA 1 Reply Last reply
                  0
                  • mcc@mastodon.socialM mcc@mastodon.social

                    (And *how* does ActivityPub avert these problems? Well, ActivityPub has the "instance" abstraction. The federate-or-defederate relationships serve as a basic web of trust so some work, like moderation, doesn't have to be fully duplicated. Data is shared between instances only when a follow-relationship requires it, reducing work. Instances can still get too big and maintainers overworked, but you can fix that problem with more, smaller instances. As above, *there ARE no small Bluesky instances*)

                    mcc@mastodon.socialM This user is from outside of this forum
                    mcc@mastodon.socialM This user is from outside of this forum
                    mcc@mastodon.social
                    wrote last edited by
                    #91

                    Updates

                    - Over the last two hours the problem has gone from "I don't see my posts" to "I see my posts 1 hour after I make them" to "17 minutes" to "3 minutes" to "it's fixed". I interpret this as the relay firehose pointer, whatever relay is in use right now, gradually catching up.

                    - I need to stress the above thread is a mix of fact (ATProto federation is duplicative and often brittle) and conjecture (I can't know what relay is being used internally by Blacksky except if Rudy tells us).

                    1 Reply Last reply
                    0
                    • breizh@pleroma.breizh.pmB breizh@pleroma.breizh.pm

                      @mcc@mastodon.social From what I understand of the protocol, they could just stop using a relay at all, but then it would increase the traffic on all the PDS that were scrapped by the relay until then, since the AppView would have to connect to each of those instead of the relay.

                      And did switching to another relay solved the issue?

                      mcc@mastodon.socialM This user is from outside of this forum
                      mcc@mastodon.socialM This user is from outside of this forum
                      mcc@mastodon.social
                      wrote last edited by
                      #92

                      @breizh As of this second, Blacksky has resolved the issue. I don't know how.

                      1 Reply Last reply
                      0
                      • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                        @mcc@mastodon.social @jeromechoo@masto.ai Each post is a dangling request which will consume 3s of CPU time and so 10× consumption of 300ms for alive server, and planned for reschedule. After a while, all workers are just stuck with full of 3s waiting process, with starvation for alive requests.

                        aeris@firefish.imirhil.frA This user is from outside of this forum
                        aeris@firefish.imirhil.frA This user is from outside of this forum
                        aeris@firefish.imirhil.fr
                        wrote last edited by
                        #93

                        @mcc@mastodon.social @jeromechoo@masto.ai After a while, you have 43 minutes latency for EVERY DELIVERY, even alive server. I experience that on my own Mastodon instance…

                        Link Preview Image
                        aeris@firefish.imirhil.frA 1 Reply Last reply
                        0
                        • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                          @mcc@mastodon.social No, it's the trouble with the push design of ActivityPub.

                          scatty_hannah@federation.networkS This user is from outside of this forum
                          scatty_hannah@federation.networkS This user is from outside of this forum
                          scatty_hannah@federation.network
                          wrote last edited by
                          #94

                          @mcc@mastodon.social @aeris@firefish.imirhil.fr if that's really the case, if anything, that's an implementation problem. Mail servers have dealt with this problem for ages. That's why they have queues and per server exponentially increasing retry intervals. Push is not inherently bad.

                          1 Reply Last reply
                          0
                          • nasser@merveilles.townN nasser@merveilles.town

                            @mcc this is a really good breakdown, thank you for this thread

                            slothrop@chaos.socialS This user is from outside of this forum
                            slothrop@chaos.socialS This user is from outside of this forum
                            slothrop@chaos.social
                            wrote last edited by
                            #95

                            @nasser @mcc Thanks indeed! This is a great explanation.

                            My own takeaway is that Bluesky is a lost cause in terms of decentralization, because its architecture is designed to resist that outcome.

                            1 Reply Last reply
                            0
                            • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                              @jeromechoo@masto.ai @mcc@mastodon.social It affect deliver to masto.ai because EACH of my post generate a dangling request, hiting timeout. After a while, my worker consume more time to dangling request taking 2-3s (hiting timeout) than trying to send content to masto.ai.

                              jeromechoo@masto.aiJ This user is from outside of this forum
                              jeromechoo@masto.aiJ This user is from outside of this forum
                              jeromechoo@masto.ai
                              wrote last edited by
                              #96

                              @aeris @mcc like I have already said — every post you’ve made just now has failed to deliver to several instances. Your instance is running just fine is it not?

                              If mastodon.social goes down. It would add ONE more failed delivery to the queue of thousands your instance is already managing.

                              aeris@firefish.imirhil.frA 1 Reply Last reply
                              0
                              • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                                @mcc@mastodon.social @jeromechoo@masto.ai After a while, you have 43 minutes latency for EVERY DELIVERY, even alive server. I experience that on my own Mastodon instance…

                                Link Preview Image
                                aeris@firefish.imirhil.frA This user is from outside of this forum
                                aeris@firefish.imirhil.frA This user is from outside of this forum
                                aeris@firefish.imirhil.fr
                                wrote last edited by
                                #97

                                @mcc@mastodon.social @jeromechoo@masto.ai At the end any workers just have 7% of "luck" (3 out of 42) to hit a down request, consuming resource for nothing for 2-3s, having no more time to schedule alive server, with 13.000 pending request because starvation, with many many alive request in those 13.000. Perhaps the 13.000th will be a alive one, but it will be delivered in only 43 minutes in average.

                                jeromechoo@masto.aiJ 1 Reply Last reply
                                0
                                • jeromechoo@masto.aiJ jeromechoo@masto.ai

                                  @aeris @mcc like I have already said — every post you’ve made just now has failed to deliver to several instances. Your instance is running just fine is it not?

                                  If mastodon.social goes down. It would add ONE more failed delivery to the queue of thousands your instance is already managing.

                                  aeris@firefish.imirhil.frA This user is from outside of this forum
                                  aeris@firefish.imirhil.frA This user is from outside of this forum
                                  aeris@firefish.imirhil.fr
                                  wrote last edited by
                                  #98

                                  @jeromechoo@masto.ai @mcc@mastodon.social No, it not running fine. I ALREADY reported 43 minutes latency to deliver ANY MESSAGE on Mastodon. This "bug" (in fact bad design) is known since ages.
                                  https://github.com/mastodon/mastodon/issues/12445

                                  aeris@firefish.imirhil.frA 1 Reply Last reply
                                  0
                                  • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                                    @jeromechoo@masto.ai @mcc@mastodon.social No, it not running fine. I ALREADY reported 43 minutes latency to deliver ANY MESSAGE on Mastodon. This "bug" (in fact bad design) is known since ages.
                                    https://github.com/mastodon/mastodon/issues/12445

                                    aeris@firefish.imirhil.frA This user is from outside of this forum
                                    aeris@firefish.imirhil.frA This user is from outside of this forum
                                    aeris@firefish.imirhil.fr
                                    wrote last edited by
                                    #99

                                    @mcc@mastodon.social @jeromechoo@masto.ai And my new instance (migrating from Mastodon to Misskey exactly for this reason) is ALREADY filled with 600 dangling requests. At this point it doesn't generate any noticable delay, but only because the overall death rate is low. If a huge instance goes down, it would not be the same at all.

                                    1 Reply Last reply
                                    0
                                    • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                                      @mcc@mastodon.social @jeromechoo@masto.ai At the end any workers just have 7% of "luck" (3 out of 42) to hit a down request, consuming resource for nothing for 2-3s, having no more time to schedule alive server, with 13.000 pending request because starvation, with many many alive request in those 13.000. Perhaps the 13.000th will be a alive one, but it will be delivered in only 43 minutes in average.

                                      jeromechoo@masto.aiJ This user is from outside of this forum
                                      jeromechoo@masto.aiJ This user is from outside of this forum
                                      jeromechoo@masto.ai
                                      wrote last edited by
                                      #100

                                      @aeris @mcc and yet, despite the 43 min latency you’re reporting, we’ve been having a perfectly synchronous conversation for the last 15 minutes.

                                      1 Reply Last reply
                                      0
                                      • mcc@mastodon.socialM mcc@mastodon.social

                                        P2P is a world where naturally the more people use it, the faster and more resilient the network becomes. Load gets distributed. Working nodes talk to each other and ignore nonworking nodes. That's how the primitive, BitTorrent era systems worked.

                                        Bluesky somehow applied superfancy alien future technology to invent P2P traffic jams. When one node goes down, the others go down because they depended on it. Because it's a mesh of interoperating microservices by different providers, not federation.

                                        ale@social.manalejandro.comA This user is from outside of this forum
                                        ale@social.manalejandro.comA This user is from outside of this forum
                                        ale@social.manalejandro.com
                                        wrote last edited by
                                        #101

                                        @mcc the worst are distributed p2p attacks like i watch in ipfs.

                                        1 Reply Last reply
                                        0
                                        Reply
                                        • Reply as topic
                                        Log in to reply
                                        • Oldest to Newest
                                        • Newest to Oldest
                                        • Most Votes


                                        • Login

                                        • Login or register to search.
                                        • First post
                                          Last post
                                        0
                                        • Categories
                                        • Recent
                                        • Tags
                                        • Popular
                                        • World
                                        • Users
                                        • Groups