Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Bleg for help: How might I get an equivalent of Google Books ngram viewer to tell me how frequent given terms might be in an LLM training data set?

Bleg for help: How might I get an equivalent of Google Books ngram viewer to tell me how frequent given terms might be in an LLM training data set?

Scheduled Pinned Locked Moved Uncategorized
5 Posts 2 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • adamshostack@infosec.exchangeA This user is from outside of this forum
    adamshostack@infosec.exchangeA This user is from outside of this forum
    adamshostack@infosec.exchange
    wrote last edited by
    #1

    Bleg for help: How might I get an equivalent of Google Books ngram viewer to tell me how frequent given terms might be in an LLM training data set?

    I'd be happiest with a tool like the ngram viewer, even if constrained to a single open weight model.

    tarah@infosec.exchangeT adamshostack@infosec.exchangeA 2 Replies Last reply
    0
    • adamshostack@infosec.exchangeA adamshostack@infosec.exchange

      Bleg for help: How might I get an equivalent of Google Books ngram viewer to tell me how frequent given terms might be in an LLM training data set?

      I'd be happiest with a tool like the ngram viewer, even if constrained to a single open weight model.

      tarah@infosec.exchangeT This user is from outside of this forum
      tarah@infosec.exchangeT This user is from outside of this forum
      tarah@infosec.exchange
      wrote last edited by
      #2

      @adamshostack dude, you know I do that, right?

      adamshostack@infosec.exchangeA 1 Reply Last reply
      0
      • tarah@infosec.exchangeT tarah@infosec.exchange

        @adamshostack dude, you know I do that, right?

        adamshostack@infosec.exchangeA This user is from outside of this forum
        adamshostack@infosec.exchangeA This user is from outside of this forum
        adamshostack@infosec.exchange
        wrote last edited by
        #3

        @Tarah Yes, but you're busy finishing $thing, and so I put it on wide scan.

        (Not sure how public that thing is right now.)

        tarah@infosec.exchangeT 1 Reply Last reply
        0
        • adamshostack@infosec.exchangeA adamshostack@infosec.exchange

          Bleg for help: How might I get an equivalent of Google Books ngram viewer to tell me how frequent given terms might be in an LLM training data set?

          I'd be happiest with a tool like the ngram viewer, even if constrained to a single open weight model.

          adamshostack@infosec.exchangeA This user is from outside of this forum
          adamshostack@infosec.exchangeA This user is from outside of this forum
          adamshostack@infosec.exchange
          wrote last edited by
          #4

          I think the answer I wanted is "cargo install inspector-gguf"

          1 Reply Last reply
          0
          • adamshostack@infosec.exchangeA adamshostack@infosec.exchange

            @Tarah Yes, but you're busy finishing $thing, and so I put it on wide scan.

            (Not sure how public that thing is right now.)

            tarah@infosec.exchangeT This user is from outside of this forum
            tarah@infosec.exchangeT This user is from outside of this forum
            tarah@infosec.exchange
            wrote last edited by
            #5

            @adamshostack not very $public but I need additional cases for the discussion anyway. Let’s see what you have, and I’ll tell you if the method will work.

            1 Reply Last reply
            1
            0
            • R relay@relay.infosec.exchange shared this topic
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups