Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. 🌐 The C10K Problem β€” The challenge that changed the internet

🌐 The C10K Problem β€” The challenge that changed the internet

Scheduled Pinned Locked Moved Uncategorized
linuxnetworkingsystemsprogrammwebdev
6 Posts 3 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • marcelschmall@infosec.exchangeM This user is from outside of this forum
    marcelschmall@infosec.exchangeM This user is from outside of this forum
    marcelschmall@infosec.exchange
    wrote last edited by
    #1

    🌐 The C10K Problem β€” The challenge that changed the internet

    1999. The web is booming. Servers are struggling. Dan Kegel asks one simple question:

    β€œWhy can’t a web server handle 10,000 simultaneous connections?”

    Not a bandwidth problem. Not a CPU problem. A design problem.

    βš™οΈ The old model was simple but deadly:
    β†’ 1 connection = 1 thread
    β†’ 10.000 connections = 10.000 threads
    β†’ 80GB RAM just for thread stacks
    β†’ Kernel spends more time scheduling than actually working

    πŸ’€ The server wasn’t busy doing work. It was busy managing the chaos.

    πŸ”§ C10K forced a complete rethink:
    β†’ 1 thread handling thousands of connections
    β†’ Non-blocking sockets
    β†’ Event loops instead of thread pools

    ⚑ The ripple effects were massive:
    β†’ epoll landed in Linux 2002
    β†’ nginx was born with this model in mind
    β†’ Node.js made it mainstream
    β†’ Redis, HAProxy β€” all children of C10K

    🐧 One blog post in 1999 rewired how the entire industry thinks about network servers. We’re still building on those ideas today.

    #Linux #Networking #SystemsProgramming #WebDev

    1 Reply Last reply
    0
    • ammarfaizi2@social.gnuweeb.orgA This user is from outside of this forum
      ammarfaizi2@social.gnuweeb.orgA This user is from outside of this forum
      ammarfaizi2@social.gnuweeb.org
      wrote last edited by
      #2

      @Suiseiseki @marcelschmall

      > Is it really that impressive [...]

      Yes. Because spawning a thread is much heavier than waiting for `POLLIN | POLLOUT`.

      Spawning a thread just to handle a single fd is definitely not resource-wise; you need a dedicated stack, a tid, scheduling parameters, I/O prio, a dedicated PCB, and so many other scheduling-related resources.

      ammarfaizi2@social.gnuweeb.orgA 1 Reply Last reply
      0
      • ammarfaizi2@social.gnuweeb.orgA ammarfaizi2@social.gnuweeb.org

        @Suiseiseki @marcelschmall

        > Is it really that impressive [...]

        Yes. Because spawning a thread is much heavier than waiting for `POLLIN | POLLOUT`.

        Spawning a thread just to handle a single fd is definitely not resource-wise; you need a dedicated stack, a tid, scheduling parameters, I/O prio, a dedicated PCB, and so many other scheduling-related resources.

        ammarfaizi2@social.gnuweeb.orgA This user is from outside of this forum
        ammarfaizi2@social.gnuweeb.orgA This user is from outside of this forum
        ammarfaizi2@social.gnuweeb.org
        wrote last edited by
        #3

        @Suiseiseki @marcelschmall

        Before epoll was invented, there were already select() and poll() to handle multiple fds with a single process, but poll()'s interface wasn't very efficient because you had to traverse all the `struct pollfd` array members for each sysret.

        If you have 10K connections and one connection sends you data, with poll(), you need to traverse up to 10K `struct pollfds` to find which connection sends you data.

        select() is even worse; it can only handle up to `FD_SETSIZE` number of fds which is defined as 1024. The traversal is pretty much the same as poll().

        epoll solves the problem by avoiding wasted iterations in the caller for iterating over connections.

        ammarfaizi2@social.gnuweeb.orgA yuka@s.umeyashiki.orgY 2 Replies Last reply
        0
        • ammarfaizi2@social.gnuweeb.orgA ammarfaizi2@social.gnuweeb.org

          @Suiseiseki @marcelschmall

          Before epoll was invented, there were already select() and poll() to handle multiple fds with a single process, but poll()'s interface wasn't very efficient because you had to traverse all the `struct pollfd` array members for each sysret.

          If you have 10K connections and one connection sends you data, with poll(), you need to traverse up to 10K `struct pollfds` to find which connection sends you data.

          select() is even worse; it can only handle up to `FD_SETSIZE` number of fds which is defined as 1024. The traversal is pretty much the same as poll().

          epoll solves the problem by avoiding wasted iterations in the caller for iterating over connections.

          ammarfaizi2@social.gnuweeb.orgA This user is from outside of this forum
          ammarfaizi2@social.gnuweeb.orgA This user is from outside of this forum
          ammarfaizi2@social.gnuweeb.org
          wrote last edited by
          #4

          @Suiseiseki @marcelschmall

          Additionally, since Linux 5.1, we have io_uring, which offers advantages over epoll, such as better performance and more I/O operations.

          #io_uring

          ammarfaizi2@social.gnuweeb.orgA 1 Reply Last reply
          0
          • ammarfaizi2@social.gnuweeb.orgA ammarfaizi2@social.gnuweeb.org

            @Suiseiseki @marcelschmall

            Additionally, since Linux 5.1, we have io_uring, which offers advantages over epoll, such as better performance and more I/O operations.

            #io_uring

            ammarfaizi2@social.gnuweeb.orgA This user is from outside of this forum
            ammarfaizi2@social.gnuweeb.orgA This user is from outside of this forum
            ammarfaizi2@social.gnuweeb.org
            wrote last edited by
            #5

            @Suiseiseki @marcelschmall

            But I think you should use io_uring with kernel 6.x+ to actually see how useful it is for various workloads. I'm not confident using io_uring on 5.x kernels.

            1 Reply Last reply
            0
            • ammarfaizi2@social.gnuweeb.orgA ammarfaizi2@social.gnuweeb.org

              @Suiseiseki @marcelschmall

              Before epoll was invented, there were already select() and poll() to handle multiple fds with a single process, but poll()'s interface wasn't very efficient because you had to traverse all the `struct pollfd` array members for each sysret.

              If you have 10K connections and one connection sends you data, with poll(), you need to traverse up to 10K `struct pollfds` to find which connection sends you data.

              select() is even worse; it can only handle up to `FD_SETSIZE` number of fds which is defined as 1024. The traversal is pretty much the same as poll().

              epoll solves the problem by avoiding wasted iterations in the caller for iterating over connections.

              yuka@s.umeyashiki.orgY This user is from outside of this forum
              yuka@s.umeyashiki.orgY This user is from outside of this forum
              yuka@s.umeyashiki.org
              wrote last edited by
              #6

              @ammarfaizi2@gnuweeb.org @Suiseiseki@freesoftwareextremist.com @marcelschmall@infosec.exchange the retval from poll can help optimize the traversal as it indicates the number of ready file descriptors.

              I know it is still not as efficient as epoll, but if you find all ready fds earlier, you can break the iteration earlier too:

                  ready_fds = poll(fds, 10000, -1);
                  if (ready_fds < 0) {
                      handle_error(ready_fds);
                      return;
                  }
              
                  for (i = 0; i < 10000; i++) {
                      if (ready_fds == 0) {
                          // All ready fds have been handled.
                          // Break out of the loop early to
                          // avoid unnecessary iterations.
                          break;
                      }
              
                      if (fds[i].revents & (POLLIN|POLLOUT)) {
                          handle_events(i, fds[i]);
                          ready_fds--;
                      }
                  }
              

              If you are lucky: that one client is not in the last index, you can break earlier.

              1 Reply Last reply
              1
              0
              • R relay@relay.publicsquare.global shared this topic
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups