Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Bluesky is down today.

Bluesky is down today.

Scheduled Pinned Locked Moved Uncategorized
101 Posts 35 Posters 1 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • esm@wetdry.worldE esm@wetdry.world

    @thisismissem @mcc probably worth noting that atproto.africa also appears to be down right now, and some microcosm services also appear to be going up and down

    firehose.network and the microcosm relays look to be unaffected for now

    esm@wetdry.worldE This user is from outside of this forum
    esm@wetdry.worldE This user is from outside of this forum
    esm@wetdry.world
    wrote last edited by
    #73

    @thisismissem @mcc rose also said a few hours ago that they were fighting a DoS attack; i'd assume whoever is doing the attack is targeting multiple notable services in the ecosystem

    thisismissem@hachyderm.ioT mcc@mastodon.socialM 2 Replies Last reply
    0
    • kunev@blewsky.socialK kunev@blewsky.social

      @mcc@mastodon.social they're allowed to succeed so they can be paraded around thet "see, it's all super distributed and decentralized".

      The moment VCs realize they need RoI a bunch of " improvements" likely mostly "for security", probably " for safety", definitely "for the children" will add to the already insane architectural costs, a bunch of operafional burden that makes it impposible for other "instances" to exist.

      khm@hj.9fs.netK This user is from outside of this forum
      khm@hj.9fs.netK This user is from outside of this forum
      khm@hj.9fs.net
      wrote last edited by
      #74
      that's the Signal playbook. "sure we can federate, but we won't, for reasons"

      CC: @mcc@mastodon.social
      1 Reply Last reply
      1
      0
      • R relay@relay.infosec.exchange shared this topic
      • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

        @mcc@mastodon.social No, it's a design trouble. ActivityPub use push when ATProto use pull.

        aeris@firefish.imirhil.frA This user is from outside of this forum
        aeris@firefish.imirhil.frA This user is from outside of this forum
        aeris@firefish.imirhil.fr
        wrote last edited by
        #75

        @mcc@mastodon.social So by design a down instance pollute everything. You can mitigate that with software yes, but background task scheduling is a hard field.

        Pull troubles is simpler to mitigate, because only require throttling output request on down instance after restart after a downtime to avoid hammering other instance to fill the gap.

        aeris@firefish.imirhil.frA 1 Reply Last reply
        0
        • esm@wetdry.worldE esm@wetdry.world

          @thisismissem @mcc rose also said a few hours ago that they were fighting a DoS attack; i'd assume whoever is doing the attack is targeting multiple notable services in the ecosystem

          thisismissem@hachyderm.ioT This user is from outside of this forum
          thisismissem@hachyderm.ioT This user is from outside of this forum
          thisismissem@hachyderm.io
          wrote last edited by
          #76

          @esm @mcc yeah, that'd be my guess. It'll be interesting to see if anyone takes responsibility for the attack, if it is an attack as suspected

          Tangentially, Russia tried to block bluesky the other day: https://netcrook.com/russia-blocks-bluesky-social-media-crackdown/

          1 Reply Last reply
          0
          • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

            @mcc@mastodon.social So by design a down instance pollute everything. You can mitigate that with software yes, but background task scheduling is a hard field.

            Pull troubles is simpler to mitigate, because only require throttling output request on down instance after restart after a downtime to avoid hammering other instance to fill the gap.

            aeris@firefish.imirhil.frA This user is from outside of this forum
            aeris@firefish.imirhil.frA This user is from outside of this forum
            aeris@firefish.imirhil.fr
            wrote last edited by
            #77

            @mcc@mastodon.social A down ATP instance is really down. No more pull or effect in the network.
            A down AP instance is not really down, all other instances try to communicate with it.

            aeris@firefish.imirhil.frA 1 Reply Last reply
            0
            • esm@wetdry.worldE esm@wetdry.world

              @thisismissem @mcc rose also said a few hours ago that they were fighting a DoS attack; i'd assume whoever is doing the attack is targeting multiple notable services in the ecosystem

              mcc@mastodon.socialM This user is from outside of this forum
              mcc@mastodon.socialM This user is from outside of this forum
              mcc@mastodon.social
              wrote last edited by
              #78

              @esm @thisismissem That's interesting, but

              1. If it's true, why would the DDOS differentially impact third-party PDSes on Blacksky while Blacksky PDS runs at normal speed?

              2. Did atproto go down because of a DOS or because of some side-effect of an attempt to move over to it as the primary relay?

              One possibility is the failures I saw were *because* we switched from bluesky to atproto.africa, causing a short netlag period while atproto.africa caught up to the present? I don't know?

              mcc@mastodon.socialM thisismissem@hachyderm.ioT 2 Replies Last reply
              0
              • mcc@mastodon.socialM mcc@mastodon.social

                @esm @thisismissem That's interesting, but

                1. If it's true, why would the DDOS differentially impact third-party PDSes on Blacksky while Blacksky PDS runs at normal speed?

                2. Did atproto go down because of a DOS or because of some side-effect of an attempt to move over to it as the primary relay?

                One possibility is the failures I saw were *because* we switched from bluesky to atproto.africa, causing a short netlag period while atproto.africa caught up to the present? I don't know?

                mcc@mastodon.socialM This user is from outside of this forum
                mcc@mastodon.socialM This user is from outside of this forum
                mcc@mastodon.social
                wrote last edited by
                #79

                @esm @thisismissem I mean it's certainly possible that I am simply misinterpreting Rudy's comments about relays!… but all we ever get from Rudy are these vague gnomic comments, so this is about the best I can do. I'd rather him be spending his time sysadminining and writing Rust than writing up incident reports for public consumption but it does mean trying to figure out wtf is happening to my feed as a blacksky user is constant detective work

                1 Reply Last reply
                0
                • mcc@mastodon.socialM mcc@mastodon.social

                  @esm @thisismissem That's interesting, but

                  1. If it's true, why would the DDOS differentially impact third-party PDSes on Blacksky while Blacksky PDS runs at normal speed?

                  2. Did atproto go down because of a DOS or because of some side-effect of an attempt to move over to it as the primary relay?

                  One possibility is the failures I saw were *because* we switched from bluesky to atproto.africa, causing a short netlag period while atproto.africa caught up to the present? I don't know?

                  thisismissem@hachyderm.ioT This user is from outside of this forum
                  thisismissem@hachyderm.ioT This user is from outside of this forum
                  thisismissem@hachyderm.io
                  wrote last edited by
                  #80

                  @mcc @esm I think we'll need to wait for the analysis and blog posts that follow.

                  1 Reply Last reply
                  0
                  • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                    @mcc@mastodon.social A down ATP instance is really down. No more pull or effect in the network.
                    A down AP instance is not really down, all other instances try to communicate with it.

                    aeris@firefish.imirhil.frA This user is from outside of this forum
                    aeris@firefish.imirhil.frA This user is from outside of this forum
                    aeris@firefish.imirhil.fr
                    wrote last edited by
                    #81

                    @mcc@mastodon.social And it's a vector attack in theory. You can bootstrap thousands of instance, just subscribing to as many account as possible, and then just shutdown your instance.
                    Any content from subscribed account will generate a background job to your down instance, then hiting timeout each time.
                    You can just flood instance like that to continue to overflow queue with dangling content.

                    1 Reply Last reply
                    0
                    • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                      @mcc@mastodon.social No. Or with huge delay. Because each of my message will generate a background job to mastodon.social, leading to queue overflow over time and more and more lag even for digipres.club delivery.

                      jeromechoo@masto.aiJ This user is from outside of this forum
                      jeromechoo@masto.aiJ This user is from outside of this forum
                      jeromechoo@masto.ai
                      wrote last edited by
                      #82

                      @aeris @mcc failed deliveries happen all the time. mastodon.social is just another instance. Since Mastodon and misskey operate on shared inboxes, the failed deliveries won’t scale sender failures by recipient instance size.

                      https://seb.jambor.dev/posts/understanding-activitypub/#:~:text=To%20combat%20this,our%20followers%20internally.

                      aeris@firefish.imirhil.frA 1 Reply Last reply
                      0
                      • jeromechoo@masto.aiJ jeromechoo@masto.ai

                        @aeris @mcc failed deliveries happen all the time. mastodon.social is just another instance. Since Mastodon and misskey operate on shared inboxes, the failed deliveries won’t scale sender failures by recipient instance size.

                        https://seb.jambor.dev/posts/understanding-activitypub/#:~:text=To%20combat%20this,our%20followers%20internally.

                        aeris@firefish.imirhil.frA This user is from outside of this forum
                        aeris@firefish.imirhil.frA This user is from outside of this forum
                        aeris@firefish.imirhil.fr
                        wrote last edited by
                        #83

                        @jeromechoo@masto.ai @mcc@mastodon.social Yes, I know that. Trouble is not one content send to many, but many content sent to one.
                        Each post of one instance is sent only once to mastodon.social, but
                        EACH post.

                        aeris@firefish.imirhil.frA 1 Reply Last reply
                        0
                        • System shared this topic
                        • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                          @jeromechoo@masto.ai @mcc@mastodon.social Yes, I know that. Trouble is not one content send to many, but many content sent to one.
                          Each post of one instance is sent only once to mastodon.social, but
                          EACH post.

                          aeris@firefish.imirhil.frA This user is from outside of this forum
                          aeris@firefish.imirhil.frA This user is from outside of this forum
                          aeris@firefish.imirhil.fr
                          wrote last edited by
                          #84

                          @mcc@mastodon.social @jeromechoo@masto.ai So a huge instance sent dozen of post per second (many content generated, but delivered only one) to another huge instance, with one background job per content to deliver.

                          aeris@firefish.imirhil.frA 1 Reply Last reply
                          0
                          • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                            @mcc@mastodon.social @jeromechoo@masto.ai So a huge instance sent dozen of post per second (many content generated, but delivered only one) to another huge instance, with one background job per content to deliver.

                            aeris@firefish.imirhil.frA This user is from outside of this forum
                            aeris@firefish.imirhil.frA This user is from outside of this forum
                            aeris@firefish.imirhil.fr
                            wrote last edited by
                            #85

                            @mcc@mastodon.social @jeromechoo@masto.ai The trouble scale not to the down instance size, but to the alive instance size. The more it is active with many content generated, the fastest the background job queue fill with dangling content.

                            jeromechoo@masto.aiJ 1 Reply Last reply
                            0
                            • mcc@mastodon.socialM mcc@mastodon.social

                              This appears to be the explanation:

                              Link Preview Image
                              Rudolph Fraser. (@rude1.blacksky.team)

                              Even their relay seems down(?) Trying to switch some things to use atproto.africa https://atproto.africa

                              favicon

                              Blacksky (blacksky.community)

                              In Bluesky, the PDS talks to the relay talks to the appview goes to the client. Blacksky set up all four last year. But they only deployed their PDS and client, at first. They used Bluesky's relay and appview. This wasn't clearly disclosed. Then there was a censorship scare, and they switched to their own appview. But apparently they're still using Bluesky's relay. This wasn't clearly disclosed. Now relay death kills Blacksky.

                              timbray@cosocial.caT This user is from outside of this forum
                              timbray@cosocial.caT This user is from outside of this forum
                              timbray@cosocial.ca
                              wrote last edited by
                              #86

                              @mcc When I initially raised my eyebrows at Bluesky's notion of "federation", I was told that anyone can run a relay on a small cheap computer, it's dead easy, etc.…

                              1 Reply Last reply
                              0
                              • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                                @mcc@mastodon.social @jeromechoo@masto.ai The trouble scale not to the down instance size, but to the alive instance size. The more it is active with many content generated, the fastest the background job queue fill with dangling content.

                                jeromechoo@masto.aiJ This user is from outside of this forum
                                jeromechoo@masto.aiJ This user is from outside of this forum
                                jeromechoo@masto.ai
                                wrote last edited by
                                #87

                                @aeris @mcc this post you made probably just failed to deliver to a few instances. One of them could be mastodon.social if it was down.

                                How does that affect delivery to masto.ai?

                                mastodon.social being down is no different to your server than any other server being down. One request each.

                                aeris@firefish.imirhil.frA 1 Reply Last reply
                                0
                                • mcc@mastodon.socialM mcc@mastodon.social

                                  Now, interestingly, this means that Blacksky users can continue talking to Blacksky users. I can read Rudy's posts on Blacksky. Because that bypasses the relay. But¹ to read my *own* posts, *on a self-hosted PDS*, Bluesky is apparently required, because Blacksky relies on Bluesky's "relay" to scrape my PDS before it gets added to the Blacksky appview database.

                                  ¹ (if I'm interpreting Rudy's posts correctly, hardly a guarantee)

                                  breizh@pleroma.breizh.pmB This user is from outside of this forum
                                  breizh@pleroma.breizh.pmB This user is from outside of this forum
                                  breizh@pleroma.breizh.pm
                                  wrote last edited by
                                  #88

                                  @mcc@mastodon.social From what I understand of the protocol, they could just stop using a relay at all, but then it would increase the traffic on all the PDS that were scrapped by the relay until then, since the AppView would have to connect to each of those instead of the relay.

                                  And did switching to another relay solved the issue?

                                  mcc@mastodon.socialM 1 Reply Last reply
                                  0
                                  • jeromechoo@masto.aiJ jeromechoo@masto.ai

                                    @aeris @mcc this post you made probably just failed to deliver to a few instances. One of them could be mastodon.social if it was down.

                                    How does that affect delivery to masto.ai?

                                    mastodon.social being down is no different to your server than any other server being down. One request each.

                                    aeris@firefish.imirhil.frA This user is from outside of this forum
                                    aeris@firefish.imirhil.frA This user is from outside of this forum
                                    aeris@firefish.imirhil.fr
                                    wrote last edited by
                                    #89

                                    @jeromechoo@masto.ai @mcc@mastodon.social It affect deliver to masto.ai because EACH of my post generate a dangling request, hiting timeout. After a while, my worker consume more time to dangling request taking 2-3s (hiting timeout) than trying to send content to masto.ai.

                                    aeris@firefish.imirhil.frA jeromechoo@masto.aiJ 2 Replies Last reply
                                    0
                                    • R relay@relay.publicsquare.global shared this topic
                                    • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                                      @jeromechoo@masto.ai @mcc@mastodon.social It affect deliver to masto.ai because EACH of my post generate a dangling request, hiting timeout. After a while, my worker consume more time to dangling request taking 2-3s (hiting timeout) than trying to send content to masto.ai.

                                      aeris@firefish.imirhil.frA This user is from outside of this forum
                                      aeris@firefish.imirhil.frA This user is from outside of this forum
                                      aeris@firefish.imirhil.fr
                                      wrote last edited by
                                      #90

                                      @mcc@mastodon.social @jeromechoo@masto.ai Each post is a dangling request which will consume 3s of CPU time and so 10× consumption of 300ms for alive server, and planned for reschedule. After a while, all workers are just stuck with full of 3s waiting process, with starvation for alive requests.

                                      aeris@firefish.imirhil.frA 1 Reply Last reply
                                      0
                                      • mcc@mastodon.socialM mcc@mastodon.social

                                        (And *how* does ActivityPub avert these problems? Well, ActivityPub has the "instance" abstraction. The federate-or-defederate relationships serve as a basic web of trust so some work, like moderation, doesn't have to be fully duplicated. Data is shared between instances only when a follow-relationship requires it, reducing work. Instances can still get too big and maintainers overworked, but you can fix that problem with more, smaller instances. As above, *there ARE no small Bluesky instances*)

                                        mcc@mastodon.socialM This user is from outside of this forum
                                        mcc@mastodon.socialM This user is from outside of this forum
                                        mcc@mastodon.social
                                        wrote last edited by
                                        #91

                                        Updates

                                        - Over the last two hours the problem has gone from "I don't see my posts" to "I see my posts 1 hour after I make them" to "17 minutes" to "3 minutes" to "it's fixed". I interpret this as the relay firehose pointer, whatever relay is in use right now, gradually catching up.

                                        - I need to stress the above thread is a mix of fact (ATProto federation is duplicative and often brittle) and conjecture (I can't know what relay is being used internally by Blacksky except if Rudy tells us).

                                        1 Reply Last reply
                                        0
                                        • breizh@pleroma.breizh.pmB breizh@pleroma.breizh.pm

                                          @mcc@mastodon.social From what I understand of the protocol, they could just stop using a relay at all, but then it would increase the traffic on all the PDS that were scrapped by the relay until then, since the AppView would have to connect to each of those instead of the relay.

                                          And did switching to another relay solved the issue?

                                          mcc@mastodon.socialM This user is from outside of this forum
                                          mcc@mastodon.socialM This user is from outside of this forum
                                          mcc@mastodon.social
                                          wrote last edited by
                                          #92

                                          @breizh As of this second, Blacksky has resolved the issue. I don't know how.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups