Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Bluesky is down today.

Bluesky is down today.

Scheduled Pinned Locked Moved Uncategorized
101 Posts 35 Posters 1 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

    @mcc@mastodon.social Currently i have around 600 "delayed" job because of down instance polluting all delivery. This was reported to Mastodon years ago. Nothing change.

    Link Preview Image
    aeris@firefish.imirhil.frA This user is from outside of this forum
    aeris@firefish.imirhil.frA This user is from outside of this forum
    aeris@firefish.imirhil.fr
    wrote last edited by
    #65

    @mcc@mastodon.social For tiny instance, it's not really a trouble, because few message and so queue don't fill.
    For huge instance, pretty all message from all instances will generate a dangling request in queue. When queue filled, delay all message for any other instance even the one alive.

    aeris@firefish.imirhil.frA 1 Reply Last reply
    0
    • esm@wetdry.worldE esm@wetdry.world

      @thisismissem @mcc probably worth noting that atproto.africa also appears to be down right now, and some microcosm services also appear to be going up and down

      firehose.network and the microcosm relays look to be unaffected for now

      thisismissem@hachyderm.ioT This user is from outside of this forum
      thisismissem@hachyderm.ioT This user is from outside of this forum
      thisismissem@hachyderm.io
      wrote last edited by
      #66

      @esm @mcc I'm sure there'll be a full write up soon. They usually do pretty good postmortems

      1 Reply Last reply
      0
      • eestileib@tech.lgbtE eestileib@tech.lgbt

        @nasser @mcc

        I have a skywalking friend and he says that if blacksky users had configured something in their app to make blacksky primary (which, to be fair, had never mattered before), their timelines would have remained synced with other blacksky users.

        And also that blacksky was getting pulled down by bluesky repeatedly coming up, demanding to know the status of every lily in the field, then crashing.

        Sounds like they need to come up with a more graceful recovery process and get bluesky to agree with it.

        mcc@mastodon.socialM This user is from outside of this forum
        mcc@mastodon.socialM This user is from outside of this forum
        mcc@mastodon.social
        wrote last edited by
        #67

        @eestileib @nasser Posts hosted on the Blacksky PDS are appearing on the Blacksky AppView immediately. That's definitely true.

        1 Reply Last reply
        0
        • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

          @mcc@mastodon.social For tiny instance, it's not really a trouble, because few message and so queue don't fill.
          For huge instance, pretty all message from all instances will generate a dangling request in queue. When queue filled, delay all message for any other instance even the one alive.

          aeris@firefish.imirhil.frA This user is from outside of this forum
          aeris@firefish.imirhil.frA This user is from outside of this forum
          aeris@firefish.imirhil.fr
          wrote last edited by
          #68

          @mcc@mastodon.social And it's worst for huge still alive instance. Hundred of message per second. Hundred of job per second for down instance. Hundred of dead job filling queue because timeout, competing resources for alive job. At a point, all workers process only dead job…

          aeris@firefish.imirhil.frA mcc@mastodon.socialM 2 Replies Last reply
          0
          • mcc@mastodon.socialM mcc@mastodon.social

            TLDR

            1. My definition of "P2P" or "Federated" is that if server A goes down, servers B and C can still talk to each other.

            2. Bluesky/"Atmosphere" fails at this because Blacksky (B) requires Bluesky (A) to talk to me (C).

            3. In order for Blacksky to avert this, they have to do something unreasonable and expensive.

            4. Blacksky someday *will* do this, but will depend heavily on massively overworking Rudy and a few other people. This may someday fail.

            5. ActivityPub has problems, but not these

            wikisteff@mastodon.socialW This user is from outside of this forum
            wikisteff@mastodon.socialW This user is from outside of this forum
            wikisteff@mastodon.social
            wrote last edited by
            #69

            @mcc This is a good take, mcc.

            1 Reply Last reply
            0
            • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

              @mcc@mastodon.social And it's worst for huge still alive instance. Hundred of message per second. Hundred of job per second for down instance. Hundred of dead job filling queue because timeout, competing resources for alive job. At a point, all workers process only dead job…

              aeris@firefish.imirhil.frA This user is from outside of this forum
              aeris@firefish.imirhil.frA This user is from outside of this forum
              aeris@firefish.imirhil.fr
              wrote last edited by
              #70

              @mcc@mastodon.social I don't know exactly what would be the effect of a 10 hour downtime like bluesky for a mastodon.social downtime for example. I expect at least delay growing over time even from no mastodon.social communication.

              1 Reply Last reply
              0
              • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                @mcc@mastodon.social And it's worst for huge still alive instance. Hundred of message per second. Hundred of job per second for down instance. Hundred of dead job filling queue because timeout, competing resources for alive job. At a point, all workers process only dead job…

                mcc@mastodon.socialM This user is from outside of this forum
                mcc@mastodon.socialM This user is from outside of this forum
                mcc@mastodon.social
                wrote last edited by
                #71

                @aeris If this problem is real I can imagine multiple ways to mitigate it. This is a software engineering problem.

                aeris@firefish.imirhil.frA 1 Reply Last reply
                0
                • mcc@mastodon.socialM mcc@mastodon.social

                  @aeris If this problem is real I can imagine multiple ways to mitigate it. This is a software engineering problem.

                  aeris@firefish.imirhil.frA This user is from outside of this forum
                  aeris@firefish.imirhil.frA This user is from outside of this forum
                  aeris@firefish.imirhil.fr
                  wrote last edited by
                  #72

                  @mcc@mastodon.social No, it's a design trouble. ActivityPub use push when ATProto use pull.

                  aeris@firefish.imirhil.frA 1 Reply Last reply
                  0
                  • esm@wetdry.worldE esm@wetdry.world

                    @thisismissem @mcc probably worth noting that atproto.africa also appears to be down right now, and some microcosm services also appear to be going up and down

                    firehose.network and the microcosm relays look to be unaffected for now

                    esm@wetdry.worldE This user is from outside of this forum
                    esm@wetdry.worldE This user is from outside of this forum
                    esm@wetdry.world
                    wrote last edited by
                    #73

                    @thisismissem @mcc rose also said a few hours ago that they were fighting a DoS attack; i'd assume whoever is doing the attack is targeting multiple notable services in the ecosystem

                    thisismissem@hachyderm.ioT mcc@mastodon.socialM 2 Replies Last reply
                    0
                    • kunev@blewsky.socialK kunev@blewsky.social

                      @mcc@mastodon.social they're allowed to succeed so they can be paraded around thet "see, it's all super distributed and decentralized".

                      The moment VCs realize they need RoI a bunch of " improvements" likely mostly "for security", probably " for safety", definitely "for the children" will add to the already insane architectural costs, a bunch of operafional burden that makes it impposible for other "instances" to exist.

                      khm@hj.9fs.netK This user is from outside of this forum
                      khm@hj.9fs.netK This user is from outside of this forum
                      khm@hj.9fs.net
                      wrote last edited by
                      #74
                      that's the Signal playbook. "sure we can federate, but we won't, for reasons"

                      CC: @mcc@mastodon.social
                      1 Reply Last reply
                      1
                      0
                      • R relay@relay.infosec.exchange shared this topic
                      • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                        @mcc@mastodon.social No, it's a design trouble. ActivityPub use push when ATProto use pull.

                        aeris@firefish.imirhil.frA This user is from outside of this forum
                        aeris@firefish.imirhil.frA This user is from outside of this forum
                        aeris@firefish.imirhil.fr
                        wrote last edited by
                        #75

                        @mcc@mastodon.social So by design a down instance pollute everything. You can mitigate that with software yes, but background task scheduling is a hard field.

                        Pull troubles is simpler to mitigate, because only require throttling output request on down instance after restart after a downtime to avoid hammering other instance to fill the gap.

                        aeris@firefish.imirhil.frA 1 Reply Last reply
                        0
                        • esm@wetdry.worldE esm@wetdry.world

                          @thisismissem @mcc rose also said a few hours ago that they were fighting a DoS attack; i'd assume whoever is doing the attack is targeting multiple notable services in the ecosystem

                          thisismissem@hachyderm.ioT This user is from outside of this forum
                          thisismissem@hachyderm.ioT This user is from outside of this forum
                          thisismissem@hachyderm.io
                          wrote last edited by
                          #76

                          @esm @mcc yeah, that'd be my guess. It'll be interesting to see if anyone takes responsibility for the attack, if it is an attack as suspected

                          Tangentially, Russia tried to block bluesky the other day: https://netcrook.com/russia-blocks-bluesky-social-media-crackdown/

                          1 Reply Last reply
                          0
                          • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                            @mcc@mastodon.social So by design a down instance pollute everything. You can mitigate that with software yes, but background task scheduling is a hard field.

                            Pull troubles is simpler to mitigate, because only require throttling output request on down instance after restart after a downtime to avoid hammering other instance to fill the gap.

                            aeris@firefish.imirhil.frA This user is from outside of this forum
                            aeris@firefish.imirhil.frA This user is from outside of this forum
                            aeris@firefish.imirhil.fr
                            wrote last edited by
                            #77

                            @mcc@mastodon.social A down ATP instance is really down. No more pull or effect in the network.
                            A down AP instance is not really down, all other instances try to communicate with it.

                            aeris@firefish.imirhil.frA 1 Reply Last reply
                            0
                            • esm@wetdry.worldE esm@wetdry.world

                              @thisismissem @mcc rose also said a few hours ago that they were fighting a DoS attack; i'd assume whoever is doing the attack is targeting multiple notable services in the ecosystem

                              mcc@mastodon.socialM This user is from outside of this forum
                              mcc@mastodon.socialM This user is from outside of this forum
                              mcc@mastodon.social
                              wrote last edited by
                              #78

                              @esm @thisismissem That's interesting, but

                              1. If it's true, why would the DDOS differentially impact third-party PDSes on Blacksky while Blacksky PDS runs at normal speed?

                              2. Did atproto go down because of a DOS or because of some side-effect of an attempt to move over to it as the primary relay?

                              One possibility is the failures I saw were *because* we switched from bluesky to atproto.africa, causing a short netlag period while atproto.africa caught up to the present? I don't know?

                              mcc@mastodon.socialM thisismissem@hachyderm.ioT 2 Replies Last reply
                              0
                              • mcc@mastodon.socialM mcc@mastodon.social

                                @esm @thisismissem That's interesting, but

                                1. If it's true, why would the DDOS differentially impact third-party PDSes on Blacksky while Blacksky PDS runs at normal speed?

                                2. Did atproto go down because of a DOS or because of some side-effect of an attempt to move over to it as the primary relay?

                                One possibility is the failures I saw were *because* we switched from bluesky to atproto.africa, causing a short netlag period while atproto.africa caught up to the present? I don't know?

                                mcc@mastodon.socialM This user is from outside of this forum
                                mcc@mastodon.socialM This user is from outside of this forum
                                mcc@mastodon.social
                                wrote last edited by
                                #79

                                @esm @thisismissem I mean it's certainly possible that I am simply misinterpreting Rudy's comments about relays!… but all we ever get from Rudy are these vague gnomic comments, so this is about the best I can do. I'd rather him be spending his time sysadminining and writing Rust than writing up incident reports for public consumption but it does mean trying to figure out wtf is happening to my feed as a blacksky user is constant detective work

                                1 Reply Last reply
                                0
                                • mcc@mastodon.socialM mcc@mastodon.social

                                  @esm @thisismissem That's interesting, but

                                  1. If it's true, why would the DDOS differentially impact third-party PDSes on Blacksky while Blacksky PDS runs at normal speed?

                                  2. Did atproto go down because of a DOS or because of some side-effect of an attempt to move over to it as the primary relay?

                                  One possibility is the failures I saw were *because* we switched from bluesky to atproto.africa, causing a short netlag period while atproto.africa caught up to the present? I don't know?

                                  thisismissem@hachyderm.ioT This user is from outside of this forum
                                  thisismissem@hachyderm.ioT This user is from outside of this forum
                                  thisismissem@hachyderm.io
                                  wrote last edited by
                                  #80

                                  @mcc @esm I think we'll need to wait for the analysis and blog posts that follow.

                                  1 Reply Last reply
                                  0
                                  • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                                    @mcc@mastodon.social A down ATP instance is really down. No more pull or effect in the network.
                                    A down AP instance is not really down, all other instances try to communicate with it.

                                    aeris@firefish.imirhil.frA This user is from outside of this forum
                                    aeris@firefish.imirhil.frA This user is from outside of this forum
                                    aeris@firefish.imirhil.fr
                                    wrote last edited by
                                    #81

                                    @mcc@mastodon.social And it's a vector attack in theory. You can bootstrap thousands of instance, just subscribing to as many account as possible, and then just shutdown your instance.
                                    Any content from subscribed account will generate a background job to your down instance, then hiting timeout each time.
                                    You can just flood instance like that to continue to overflow queue with dangling content.

                                    1 Reply Last reply
                                    0
                                    • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                                      @mcc@mastodon.social No. Or with huge delay. Because each of my message will generate a background job to mastodon.social, leading to queue overflow over time and more and more lag even for digipres.club delivery.

                                      jeromechoo@masto.aiJ This user is from outside of this forum
                                      jeromechoo@masto.aiJ This user is from outside of this forum
                                      jeromechoo@masto.ai
                                      wrote last edited by
                                      #82

                                      @aeris @mcc failed deliveries happen all the time. mastodon.social is just another instance. Since Mastodon and misskey operate on shared inboxes, the failed deliveries won’t scale sender failures by recipient instance size.

                                      https://seb.jambor.dev/posts/understanding-activitypub/#:~:text=To%20combat%20this,our%20followers%20internally.

                                      aeris@firefish.imirhil.frA 1 Reply Last reply
                                      0
                                      • jeromechoo@masto.aiJ jeromechoo@masto.ai

                                        @aeris @mcc failed deliveries happen all the time. mastodon.social is just another instance. Since Mastodon and misskey operate on shared inboxes, the failed deliveries won’t scale sender failures by recipient instance size.

                                        https://seb.jambor.dev/posts/understanding-activitypub/#:~:text=To%20combat%20this,our%20followers%20internally.

                                        aeris@firefish.imirhil.frA This user is from outside of this forum
                                        aeris@firefish.imirhil.frA This user is from outside of this forum
                                        aeris@firefish.imirhil.fr
                                        wrote last edited by
                                        #83

                                        @jeromechoo@masto.ai @mcc@mastodon.social Yes, I know that. Trouble is not one content send to many, but many content sent to one.
                                        Each post of one instance is sent only once to mastodon.social, but
                                        EACH post.

                                        aeris@firefish.imirhil.frA 1 Reply Last reply
                                        0
                                        • System shared this topic
                                        • aeris@firefish.imirhil.frA aeris@firefish.imirhil.fr

                                          @jeromechoo@masto.ai @mcc@mastodon.social Yes, I know that. Trouble is not one content send to many, but many content sent to one.
                                          Each post of one instance is sent only once to mastodon.social, but
                                          EACH post.

                                          aeris@firefish.imirhil.frA This user is from outside of this forum
                                          aeris@firefish.imirhil.frA This user is from outside of this forum
                                          aeris@firefish.imirhil.fr
                                          wrote last edited by
                                          #84

                                          @mcc@mastodon.social @jeromechoo@masto.ai So a huge instance sent dozen of post per second (many content generated, but delivered only one) to another huge instance, with one background job per content to deliver.

                                          aeris@firefish.imirhil.frA 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups