Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Had a lot of fun with my stats students today.

Had a lot of fun with my stats students today.

Scheduled Pinned Locked Moved Uncategorized
112 Posts 62 Posters 17 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • jedbrown@hachyderm.ioJ jedbrown@hachyderm.io

    @dpiponi Even with a raw model, I don't see how you would sample from the distribution of numbers in the corpus. Perhaps provide no context and sample one or more tokens (using an independent pseudo-random number generator) from the distribution, and if the returned token parses as a number, return it to the user, otherwise try again. Providing any context/prompt would bias what is returned. This seems too contrived/circular.
    @futurebird

    dpiponi@mathstodon.xyzD This user is from outside of this forum
    dpiponi@mathstodon.xyzD This user is from outside of this forum
    dpiponi@mathstodon.xyz
    wrote last edited by
    #43

    @jedbrown @futurebird You described exactly what I would do. Obviously it would depend on an external PRNG and yes, no prompt. One natural way to use an LLM is to transform draws from a PRNG into draws from a distribution intended to represent some corpus. Picking numbers out of these draws would be expected to have a similar distribution to picking numbers from the original corpus. IIRC I may already have tested to see of the results conform to Benford's law - I did a lot of stuff like that when llama.cpp first became available. You have to select the right parameters to have llama.cpp use the distribution "correctly".

    1 Reply Last reply
    0
    • futurebird@sauropods.winF This user is from outside of this forum
      futurebird@sauropods.winF This user is from outside of this forum
      futurebird@sauropods.win
      wrote last edited by
      #44

      @ricko

      This is the epistemological issue I have with the interface. It's ... well, not to be harsh but it's deceptive.

      If you ask a "computer" for random numbers that has a kind of meaning, and expected process. If you ask a computer "how did you generate those random numbers?" that also has a set of expectations... and an LLM isn't meeting ANY of them.

      1 Reply Last reply
      0
      • futurebird@sauropods.winF futurebird@sauropods.win

        @Bumblefish

        Which one is random?
        (data sets are 100 numbers 1 to 6)

        listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

        listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

        alienghic@timeloop.cafeA This user is from outside of this forum
        alienghic@timeloop.cafeA This user is from outside of this forum
        alienghic@timeloop.cafe
        wrote last edited by
        #45

        @futurebird

        The mean and standard deviations for both lists are about the same.

        3.46 mean 1.7 stddev for listA
        3.42 mean 1.69 stddev for listB

        However for listA, the count how often the values appear are all 17 or 16 so it appears to be a uniform distribution, while for list B 3 shows up 24 times, and 4 and 5 are less frequent at 12 and 14 times respectively.

        My conclusion is listA was generated from a uniform random distribution and listB was not.

        I can't tell if listB was made by some other more advanced random distribution, but honestly it looks like someone took a uniform distribution and turned some of the 4s and 5s into 3s.

        1 Reply Last reply
        0
        • dlakelan@mastodon.sdf.orgD dlakelan@mastodon.sdf.org

          @futurebird
          things I would check are first the frequency of each number... they should be somewhat uniform but not TOO close to equal as all exactly equal is unlikely... next I'd look at the length of repeat sequences and compare to expected values.

          the actual definition of random sequences (Per Martin-Löf) is in terms of passing tests actually
          @Bumblefish

          alienghic@timeloop.cafeA This user is from outside of this forum
          alienghic@timeloop.cafeA This user is from outside of this forum
          alienghic@timeloop.cafe
          wrote last edited by
          #46

          @dlakelan @futurebird

          The dictionaries in the Counter() object are the number of times each integer appears.

          In [18]: Counter(listA)
          Out[18]: Counter(
          {2: 17, 3: 17, 5: 16, 1: 17, 4: 17, 6: 16}
          )

          In [19]: Counter(listB)
          Out[19]: Counter(
          {4: 12, 2: 17, 5: 14, 6: 17, 3: 24, 1: 16}
          )

          dlakelan@mastodon.sdf.orgD 1 Reply Last reply
          0
          • dlakelan@mastodon.sdf.orgD dlakelan@mastodon.sdf.org

            @futurebird
            things I would check are first the frequency of each number... they should be somewhat uniform but not TOO close to equal as all exactly equal is unlikely... next I'd look at the length of repeat sequences and compare to expected values.

            the actual definition of random sequences (Per Martin-Löf) is in terms of passing tests actually
            @Bumblefish

            danpmoore@mathstodon.xyzD This user is from outside of this forum
            danpmoore@mathstodon.xyzD This user is from outside of this forum
            danpmoore@mathstodon.xyz
            wrote last edited by
            #47

            @dlakelan @futurebird @Bumblefish Based on this description, A looks too uniform. B could be random.

            dlakelan@mastodon.sdf.orgD 1 Reply Last reply
            0
            • futurebird@sauropods.winF futurebird@sauropods.win

              @zalasur @Bumblefish

              You *can* make an argument for one of these lists being random like a dice roll and the other being much less likely to be generated in that way.

              zalasur@mastodon.surazal.netZ This user is from outside of this forum
              zalasur@mastodon.surazal.netZ This user is from outside of this forum
              zalasur@mastodon.surazal.net
              wrote last edited by
              #48

              @futurebird @Bumblefish Yes, you can determine probable likelihood. But given any list of items, it is impossible to prove or disprove whether a list is random or not.

              1 Reply Last reply
              0
              • madjohnroberts@mastodon.socialM madjohnroberts@mastodon.social

                @futurebird @Bumblefish listA has 17 occurrences of 1-4 and 16 of 5-6, where listB has different frequencies for each. I would guess that listB is actually random, listA is too nice.

                sabrina@fedi01.unicornsparkle.clubS This user is from outside of this forum
                sabrina@fedi01.unicornsparkle.clubS This user is from outside of this forum
                sabrina@fedi01.unicornsparkle.club
                wrote last edited by
                #49

                @madjohnroberts @futurebird @Bumblefish

                If List A has nearly equal occurrences of each number then that’s the one most likely to have been produced by the equivalent of rolling a die 100 times.

                madjohnroberts@mastodon.socialM 1 Reply Last reply
                0
                • alienghic@timeloop.cafeA alienghic@timeloop.cafe

                  @dlakelan @futurebird

                  The dictionaries in the Counter() object are the number of times each integer appears.

                  In [18]: Counter(listA)
                  Out[18]: Counter(
                  {2: 17, 3: 17, 5: 16, 1: 17, 4: 17, 6: 16}
                  )

                  In [19]: Counter(listB)
                  Out[19]: Counter(
                  {4: 12, 2: 17, 5: 14, 6: 17, 3: 24, 1: 16}
                  )

                  dlakelan@mastodon.sdf.orgD This user is from outside of this forum
                  dlakelan@mastodon.sdf.orgD This user is from outside of this forum
                  dlakelan@mastodon.sdf.org
                  wrote last edited by
                  #50

                  @alienghic
                  I'm on my phone at a volleyball game but what's the likelihood for each (probability of seeing that vector of counts given a multinomial distribution with 1/6 as probability for each value)

                  should be pretty easy in R or Julia or Python though offhand I would need to look at docs for any of them. Julia would be something like
                  using Distributions
                  pdf(Multinomial([1/6, 1/6,...], [17,17,17,17,16,16])
                  @futurebird

                  1 Reply Last reply
                  0
                  • futurebird@sauropods.winF futurebird@sauropods.win

                    @Bumblefish

                    Which one is random?
                    (data sets are 100 numbers 1 to 6)

                    listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                    listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                    koushiniku@hachyderm.ioK This user is from outside of this forum
                    koushiniku@hachyderm.ioK This user is from outside of this forum
                    koushiniku@hachyderm.io
                    wrote last edited by
                    #51

                    @futurebird @Bumblefish
                    16 🤷 17

                    1 Reply Last reply
                    0
                    • danpmoore@mathstodon.xyzD danpmoore@mathstodon.xyz

                      @dlakelan @futurebird @Bumblefish Based on this description, A looks too uniform. B could be random.

                      dlakelan@mastodon.sdf.orgD This user is from outside of this forum
                      dlakelan@mastodon.sdf.orgD This user is from outside of this forum
                      dlakelan@mastodon.sdf.org
                      wrote last edited by
                      #52

                      @danpmoore
                      agreed, the frequencies seem too uniform for the first intuitively.
                      @futurebird @Bumblefish

                      1 Reply Last reply
                      0
                      • futurebird@sauropods.winF futurebird@sauropods.win

                        @Bumblefish

                        Which one is random?
                        (data sets are 100 numbers 1 to 6)

                        listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                        listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                        charette@mstdn.caC This user is from outside of this forum
                        charette@mstdn.caC This user is from outside of this forum
                        charette@mstdn.ca
                        wrote last edited by
                        #53

                        @futurebird Can you settle the question?

                        (My vote is the many 3x repeated sequences in listA is not random, but I'm not dedicated enough to pull out a die and record 100 rolls to see if that is likely to happen a bunch of times.)

                        futurebird@sauropods.winF 1 Reply Last reply
                        0
                        • sabrina@fedi01.unicornsparkle.clubS sabrina@fedi01.unicornsparkle.club

                          @madjohnroberts @futurebird @Bumblefish

                          If List A has nearly equal occurrences of each number then that’s the one most likely to have been produced by the equivalent of rolling a die 100 times.

                          madjohnroberts@mastodon.socialM This user is from outside of this forum
                          madjohnroberts@mastodon.socialM This user is from outside of this forum
                          madjohnroberts@mastodon.social
                          wrote last edited by
                          #54

                          @sabrina I think the frequency being within floor/ciel of 100/6 and the first four being ciel(100/6) and last two floor(100/6) shows intentionality. I agree the frequency should be close but not exact! It's harder to say for certain though, 100 samples isn't so much and I think with a larger N the difference would be more apparent with listB showing less volatility
                          @futurebird @Bumblefish

                          1 Reply Last reply
                          0
                          • charette@mstdn.caC charette@mstdn.ca

                            @futurebird Can you settle the question?

                            (My vote is the many 3x repeated sequences in listA is not random, but I'm not dedicated enough to pull out a die and record 100 rolls to see if that is likely to happen a bunch of times.)

                            futurebird@sauropods.winF This user is from outside of this forum
                            futurebird@sauropods.winF This user is from outside of this forum
                            futurebird@sauropods.win
                            wrote last edited by
                            #55

                            ListA was created by making a list of 16 or 17 of each number. The Stdev **of the frequencies** is much lower than what you will find on random lists of similar size.

                            ListB was made by rolling dice.

                            2something@transfem.social2 fsologureng@chilemasto.casaF 2 Replies Last reply
                            0
                            • apophis@yourwalls.todayA apophis@yourwalls.today
                              @futurebird i'm guessing the second one is made up because there aren't enough triples?


                              @Bumblefish
                              futurebird@sauropods.winF This user is from outside of this forum
                              futurebird@sauropods.winF This user is from outside of this forum
                              futurebird@sauropods.win
                              wrote last edited by
                              #56

                              @apophis @Bumblefish

                              I don't think the order should matter. The "problem" isn't related to the order of the list.

                              1 Reply Last reply
                              0
                              • futurebird@sauropods.winF futurebird@sauropods.win

                                There is something very creepy about the way LLMs willy cheerfully give lists of "random" numbers. But they aren't random in frequency, and as my students pointed out "it's probably from some webpage about how to generate random numbers"

                                But even then, why is the frequency so unnaturally regular? Is that an artifact from mixing lists of real random numbers together?

                                rubinlinux@mastodon.sdf.orgR This user is from outside of this forum
                                rubinlinux@mastodon.sdf.orgR This user is from outside of this forum
                                rubinlinux@mastodon.sdf.org
                                wrote last edited by
                                #57

                                @futurebird Think of a chat with an LLM similar to a chat with a fellow (but maybe not so great) improv doing a skit. It is trying to play along with anything you give it. Always.

                                1 Reply Last reply
                                0
                                • futurebird@sauropods.winF futurebird@sauropods.win

                                  "Why don't you just load a library to find the mean and SD?"

                                  Because I'M OLD. I like to write my own function. I do it for integration sometimes... kids these days.

                                  koushiniku@hachyderm.ioK This user is from outside of this forum
                                  koushiniku@hachyderm.ioK This user is from outside of this forum
                                  koushiniku@hachyderm.io
                                  wrote last edited by
                                  #58

                                  @futurebird I found out quickly that the entropy tools from NIST and Fourmilab don’t work well with a data set that’s log2(6) bits per element.

                                  1 Reply Last reply
                                  0
                                  • futurebird@sauropods.winF futurebird@sauropods.win

                                    @Bumblefish

                                    Which one is random?
                                    (data sets are 100 numbers 1 to 6)

                                    listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                                    listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                                    moira@mastodon.murkworks.netM This user is from outside of this forum
                                    moira@mastodon.murkworks.netM This user is from outside of this forum
                                    moira@mastodon.murkworks.net
                                    wrote last edited by
                                    #59

                                    @futurebird @Bumblefish Heh, this reminds me of something from school where... Evan? Somebody. made a plot of outputs from the system's (pseudo-)random number generator and turns out there some _very visible_ patterns. Like, obvious visible stripes in the number selection density plot.

                                    #maths

                                    dpnash@c.imD 1 Reply Last reply
                                    0
                                    • futurebird@sauropods.winF futurebird@sauropods.win

                                      @Bumblefish

                                      Which one is random?
                                      (data sets are 100 numbers 1 to 6)

                                      listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                                      listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                                      dragonfrog@mastodon.sdf.orgD This user is from outside of this forum
                                      dragonfrog@mastodon.sdf.orgD This user is from outside of this forum
                                      dragonfrog@mastodon.sdf.org
                                      wrote last edited by
                                      #60

                                      @futurebird @Bumblefish
                                      I think list B is random.

                                      As others have noted A has 17 @1,2,3,4, and 16@5,6, while B is "lumpier". Also looking at the difference between consecutive numbers, list A has 23 0s (number N = number N+1), 21 +1s (Number N 1 greater than number N+1) - very clustered around repeating numbers or increments by 1. In list B the difference between consecutive numbers is much more evenly distributed, suggesting number N+1 really was independent of number N.

                                      1 Reply Last reply
                                      0
                                      • futurebird@sauropods.winF futurebird@sauropods.win

                                        @Bumblefish

                                        Which one is random?
                                        (data sets are 100 numbers 1 to 6)

                                        listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                                        listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                                        dhobern@scicomm.xyzD This user is from outside of this forum
                                        dhobern@scicomm.xyzD This user is from outside of this forum
                                        dhobern@scicomm.xyz
                                        wrote last edited by
                                        #61

                                        @futurebird @Bumblefish

                                        Replacing a bad analysis where I forgot we are dealing with dice, not decimal digits.

                                        The first has 23/99 runs of two matching digits and 5/98 runs of three.

                                        The second has 12/99 and 1/98.

                                        The expected mean fractions would be 1/6 and 1/36.

                                        The latter series is a little closer to the expected values, but each of the two series is at some distance (on opposite sides) of the mean.

                                        These are only a couple of the possible information signals that could be checked, but they seem prima facie to suggest the second is a slightly more plausibly random-adjacent series.

                                        1 Reply Last reply
                                        0
                                        • futurebird@sauropods.winF futurebird@sauropods.win

                                          @Bumblefish

                                          Which one is random?
                                          (data sets are 100 numbers 1 to 6)

                                          listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                                          listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                                          abyssalrook@mstdn.socialA This user is from outside of this forum
                                          abyssalrook@mstdn.socialA This user is from outside of this forum
                                          abyssalrook@mstdn.social
                                          wrote last edited by
                                          #62

                                          @futurebird Before I look at where the answer shows up, my guess would be that List A is random.

                                          The odds of both dice being the same number when you roll 2 dice is 1/6 (36 possibilities, 6 desired results). For 3, that becomes 1/36. (6*6*6 possibilities, 6 desired).

                                          What we have here is 98 consecutive possible places for a 3-of-a-kind to start. The odds that you would only draw the 1/36 chance ONCE (The 3 2's near the beginning of B) is something like....8%?

                                          abyssalrook@mstdn.socialA ingalovinde@embracing.spaceI 2 Replies Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups