Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Hey everyone, looks like having actual "standards" and sticking to them works!

Hey everyone, looks like having actual "standards" and sticking to them works!

Scheduled Pinned Locked Moved Uncategorized
2 Posts 2 Posters 3 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • codinghorror@infosec.exchangeC This user is from outside of this forum
    codinghorror@infosec.exchangeC This user is from outside of this forum
    codinghorror@infosec.exchange
    wrote last edited by
    #1

    Hey everyone, looks like having actual "standards" and sticking to them works! Or we are just total jerks who will not blindly accept whatever a random person types in to a textbox. One or the other. Who can say, really?

    The Importance of High-Signal Data Modern AI research has definitively proven that data quality is vastly more important than data quantity. Microsoft's landmark 2023 paper "Textbooks Are All You Need" (which introduced the phi-1 coding model) demonstrated that aggressively filtering out low-quality "noise" from training data leads to dramatically better coding models."

    The Subsidy of Human Labor: Stack Overflow's notorious moderation policies—closing duplicates, downvoting broken code, and demanding minimal reproducible examples—did exactly this human-labor-intensive data filtering for over 15 years. Without this rigorous gatekeeping, LLMs would have ingested vast amounts of broken, insecure, or poorly formatted code, which would have severely degraded their baseline performance. The assertion that AI companies are "subsidized" by the unpaid labor of diligent forum moderators is a widely accepted critique in the fields of AI ethics and data provenance.

    kkarhan@jorts.horseK 1 Reply Last reply
    1
    0
    • codinghorror@infosec.exchangeC codinghorror@infosec.exchange

      Hey everyone, looks like having actual "standards" and sticking to them works! Or we are just total jerks who will not blindly accept whatever a random person types in to a textbox. One or the other. Who can say, really?

      The Importance of High-Signal Data Modern AI research has definitively proven that data quality is vastly more important than data quantity. Microsoft's landmark 2023 paper "Textbooks Are All You Need" (which introduced the phi-1 coding model) demonstrated that aggressively filtering out low-quality "noise" from training data leads to dramatically better coding models."

      The Subsidy of Human Labor: Stack Overflow's notorious moderation policies—closing duplicates, downvoting broken code, and demanding minimal reproducible examples—did exactly this human-labor-intensive data filtering for over 15 years. Without this rigorous gatekeeping, LLMs would have ingested vast amounts of broken, insecure, or poorly formatted code, which would have severely degraded their baseline performance. The assertion that AI companies are "subsidized" by the unpaid labor of diligent forum moderators is a widely accepted critique in the fields of AI ethics and data provenance.

      kkarhan@jorts.horseK This user is from outside of this forum
      kkarhan@jorts.horseK This user is from outside of this forum
      kkarhan@jorts.horse
      wrote last edited by
      #2

      @codinghorror and that's why said #AIslop should be #banned as #WastefulCmputing and any research and data #OpenSourced!

      1 Reply Last reply
      0
      • System shared this topic
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups