Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Had a lot of fun with my stats students today.

Had a lot of fun with my stats students today.

Scheduled Pinned Locked Moved Uncategorized
112 Posts 62 Posters 20 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • alienghic@timeloop.cafeA alienghic@timeloop.cafe

    @dlakelan @futurebird

    The dictionaries in the Counter() object are the number of times each integer appears.

    In [18]: Counter(listA)
    Out[18]: Counter(
    {2: 17, 3: 17, 5: 16, 1: 17, 4: 17, 6: 16}
    )

    In [19]: Counter(listB)
    Out[19]: Counter(
    {4: 12, 2: 17, 5: 14, 6: 17, 3: 24, 1: 16}
    )

    dlakelan@mastodon.sdf.orgD This user is from outside of this forum
    dlakelan@mastodon.sdf.orgD This user is from outside of this forum
    dlakelan@mastodon.sdf.org
    wrote last edited by
    #50

    @alienghic
    I'm on my phone at a volleyball game but what's the likelihood for each (probability of seeing that vector of counts given a multinomial distribution with 1/6 as probability for each value)

    should be pretty easy in R or Julia or Python though offhand I would need to look at docs for any of them. Julia would be something like
    using Distributions
    pdf(Multinomial([1/6, 1/6,...], [17,17,17,17,16,16])
    @futurebird

    1 Reply Last reply
    0
    • futurebird@sauropods.winF futurebird@sauropods.win

      @Bumblefish

      Which one is random?
      (data sets are 100 numbers 1 to 6)

      listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

      listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

      koushiniku@hachyderm.ioK This user is from outside of this forum
      koushiniku@hachyderm.ioK This user is from outside of this forum
      koushiniku@hachyderm.io
      wrote last edited by
      #51

      @futurebird @Bumblefish
      16 🤷 17

      1 Reply Last reply
      0
      • danpmoore@mathstodon.xyzD danpmoore@mathstodon.xyz

        @dlakelan @futurebird @Bumblefish Based on this description, A looks too uniform. B could be random.

        dlakelan@mastodon.sdf.orgD This user is from outside of this forum
        dlakelan@mastodon.sdf.orgD This user is from outside of this forum
        dlakelan@mastodon.sdf.org
        wrote last edited by
        #52

        @danpmoore
        agreed, the frequencies seem too uniform for the first intuitively.
        @futurebird @Bumblefish

        1 Reply Last reply
        0
        • futurebird@sauropods.winF futurebird@sauropods.win

          @Bumblefish

          Which one is random?
          (data sets are 100 numbers 1 to 6)

          listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

          listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

          charette@mstdn.caC This user is from outside of this forum
          charette@mstdn.caC This user is from outside of this forum
          charette@mstdn.ca
          wrote last edited by
          #53

          @futurebird Can you settle the question?

          (My vote is the many 3x repeated sequences in listA is not random, but I'm not dedicated enough to pull out a die and record 100 rolls to see if that is likely to happen a bunch of times.)

          futurebird@sauropods.winF 1 Reply Last reply
          0
          • sabrina@fedi01.unicornsparkle.clubS sabrina@fedi01.unicornsparkle.club

            @madjohnroberts @futurebird @Bumblefish

            If List A has nearly equal occurrences of each number then that’s the one most likely to have been produced by the equivalent of rolling a die 100 times.

            madjohnroberts@mastodon.socialM This user is from outside of this forum
            madjohnroberts@mastodon.socialM This user is from outside of this forum
            madjohnroberts@mastodon.social
            wrote last edited by
            #54

            @sabrina I think the frequency being within floor/ciel of 100/6 and the first four being ciel(100/6) and last two floor(100/6) shows intentionality. I agree the frequency should be close but not exact! It's harder to say for certain though, 100 samples isn't so much and I think with a larger N the difference would be more apparent with listB showing less volatility
            @futurebird @Bumblefish

            1 Reply Last reply
            0
            • charette@mstdn.caC charette@mstdn.ca

              @futurebird Can you settle the question?

              (My vote is the many 3x repeated sequences in listA is not random, but I'm not dedicated enough to pull out a die and record 100 rolls to see if that is likely to happen a bunch of times.)

              futurebird@sauropods.winF This user is from outside of this forum
              futurebird@sauropods.winF This user is from outside of this forum
              futurebird@sauropods.win
              wrote last edited by
              #55

              ListA was created by making a list of 16 or 17 of each number. The Stdev **of the frequencies** is much lower than what you will find on random lists of similar size.

              ListB was made by rolling dice.

              2something@transfem.social2 fsologureng@chilemasto.casaF 2 Replies Last reply
              0
              • apophis@yourwalls.todayA apophis@yourwalls.today
                @futurebird i'm guessing the second one is made up because there aren't enough triples?


                @Bumblefish
                futurebird@sauropods.winF This user is from outside of this forum
                futurebird@sauropods.winF This user is from outside of this forum
                futurebird@sauropods.win
                wrote last edited by
                #56

                @apophis @Bumblefish

                I don't think the order should matter. The "problem" isn't related to the order of the list.

                1 Reply Last reply
                0
                • futurebird@sauropods.winF futurebird@sauropods.win

                  There is something very creepy about the way LLMs willy cheerfully give lists of "random" numbers. But they aren't random in frequency, and as my students pointed out "it's probably from some webpage about how to generate random numbers"

                  But even then, why is the frequency so unnaturally regular? Is that an artifact from mixing lists of real random numbers together?

                  rubinlinux@mastodon.sdf.orgR This user is from outside of this forum
                  rubinlinux@mastodon.sdf.orgR This user is from outside of this forum
                  rubinlinux@mastodon.sdf.org
                  wrote last edited by
                  #57

                  @futurebird Think of a chat with an LLM similar to a chat with a fellow (but maybe not so great) improv doing a skit. It is trying to play along with anything you give it. Always.

                  1 Reply Last reply
                  0
                  • futurebird@sauropods.winF futurebird@sauropods.win

                    "Why don't you just load a library to find the mean and SD?"

                    Because I'M OLD. I like to write my own function. I do it for integration sometimes... kids these days.

                    koushiniku@hachyderm.ioK This user is from outside of this forum
                    koushiniku@hachyderm.ioK This user is from outside of this forum
                    koushiniku@hachyderm.io
                    wrote last edited by
                    #58

                    @futurebird I found out quickly that the entropy tools from NIST and Fourmilab don’t work well with a data set that’s log2(6) bits per element.

                    1 Reply Last reply
                    0
                    • futurebird@sauropods.winF futurebird@sauropods.win

                      @Bumblefish

                      Which one is random?
                      (data sets are 100 numbers 1 to 6)

                      listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                      listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                      moira@mastodon.murkworks.netM This user is from outside of this forum
                      moira@mastodon.murkworks.netM This user is from outside of this forum
                      moira@mastodon.murkworks.net
                      wrote last edited by
                      #59

                      @futurebird @Bumblefish Heh, this reminds me of something from school where... Evan? Somebody. made a plot of outputs from the system's (pseudo-)random number generator and turns out there some _very visible_ patterns. Like, obvious visible stripes in the number selection density plot.

                      #maths

                      dpnash@c.imD 1 Reply Last reply
                      0
                      • futurebird@sauropods.winF futurebird@sauropods.win

                        @Bumblefish

                        Which one is random?
                        (data sets are 100 numbers 1 to 6)

                        listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                        listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                        dragonfrog@mastodon.sdf.orgD This user is from outside of this forum
                        dragonfrog@mastodon.sdf.orgD This user is from outside of this forum
                        dragonfrog@mastodon.sdf.org
                        wrote last edited by
                        #60

                        @futurebird @Bumblefish
                        I think list B is random.

                        As others have noted A has 17 @1,2,3,4, and 16@5,6, while B is "lumpier". Also looking at the difference between consecutive numbers, list A has 23 0s (number N = number N+1), 21 +1s (Number N 1 greater than number N+1) - very clustered around repeating numbers or increments by 1. In list B the difference between consecutive numbers is much more evenly distributed, suggesting number N+1 really was independent of number N.

                        1 Reply Last reply
                        0
                        • futurebird@sauropods.winF futurebird@sauropods.win

                          @Bumblefish

                          Which one is random?
                          (data sets are 100 numbers 1 to 6)

                          listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                          listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                          dhobern@scicomm.xyzD This user is from outside of this forum
                          dhobern@scicomm.xyzD This user is from outside of this forum
                          dhobern@scicomm.xyz
                          wrote last edited by
                          #61

                          @futurebird @Bumblefish

                          Replacing a bad analysis where I forgot we are dealing with dice, not decimal digits.

                          The first has 23/99 runs of two matching digits and 5/98 runs of three.

                          The second has 12/99 and 1/98.

                          The expected mean fractions would be 1/6 and 1/36.

                          The latter series is a little closer to the expected values, but each of the two series is at some distance (on opposite sides) of the mean.

                          These are only a couple of the possible information signals that could be checked, but they seem prima facie to suggest the second is a slightly more plausibly random-adjacent series.

                          1 Reply Last reply
                          0
                          • futurebird@sauropods.winF futurebird@sauropods.win

                            @Bumblefish

                            Which one is random?
                            (data sets are 100 numbers 1 to 6)

                            listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                            listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                            abyssalrook@mstdn.socialA This user is from outside of this forum
                            abyssalrook@mstdn.socialA This user is from outside of this forum
                            abyssalrook@mstdn.social
                            wrote last edited by
                            #62

                            @futurebird Before I look at where the answer shows up, my guess would be that List A is random.

                            The odds of both dice being the same number when you roll 2 dice is 1/6 (36 possibilities, 6 desired results). For 3, that becomes 1/36. (6*6*6 possibilities, 6 desired).

                            What we have here is 98 consecutive possible places for a 3-of-a-kind to start. The odds that you would only draw the 1/36 chance ONCE (The 3 2's near the beginning of B) is something like....8%?

                            abyssalrook@mstdn.socialA ingalovinde@embracing.spaceI 2 Replies Last reply
                            0
                            • abyssalrook@mstdn.socialA abyssalrook@mstdn.social

                              @futurebird Before I look at where the answer shows up, my guess would be that List A is random.

                              The odds of both dice being the same number when you roll 2 dice is 1/6 (36 possibilities, 6 desired results). For 3, that becomes 1/36. (6*6*6 possibilities, 6 desired).

                              What we have here is 98 consecutive possible places for a 3-of-a-kind to start. The odds that you would only draw the 1/36 chance ONCE (The 3 2's near the beginning of B) is something like....8%?

                              abyssalrook@mstdn.socialA This user is from outside of this forum
                              abyssalrook@mstdn.socialA This user is from outside of this forum
                              abyssalrook@mstdn.social
                              wrote last edited by
                              #63

                              @futurebird The point is, having it appear once is something like a 94% chance. Seeing a 3-of-a-kind appear more than once is very much expected in a random distribution.

                              But it's NOT what we EXPECT a random distribution to look like, from a human perspective. When people see things like that appear, they get nervous. If they're making a list to LOOK random, having 3 of the same number in a row starts to feel NOT random, like it's some kind of pattern, and so they won't do it much.

                              abyssalrook@mstdn.socialA 1 Reply Last reply
                              0
                              • futurebird@sauropods.winF futurebird@sauropods.win

                                The LLM is like a little box of computer horrors that we peer into from time to time.

                                I'm sorry but the whole interface is just so silly.

                                You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?

                                leorjorge@mastodon.socialL This user is from outside of this forum
                                leorjorge@mastodon.socialL This user is from outside of this forum
                                leorjorge@mastodon.social
                                wrote last edited by
                                #64

                                @futurebird the first time I had to go nuclear about LLM use in my department was when my boss was showing me her design for a major experiment where they were planting actual trees of different species in long term plots, and when I asked how did they randomise the distribution of species she said the post doc responsible for setting up the experiment had asked chatgpt to randomise it! (1/2)

                                leorjorge@mastodon.socialL 1 Reply Last reply
                                0
                                • leorjorge@mastodon.socialL leorjorge@mastodon.social

                                  @futurebird the first time I had to go nuclear about LLM use in my department was when my boss was showing me her design for a major experiment where they were planting actual trees of different species in long term plots, and when I asked how did they randomise the distribution of species she said the post doc responsible for setting up the experiment had asked chatgpt to randomise it! (1/2)

                                  leorjorge@mastodon.socialL This user is from outside of this forum
                                  leorjorge@mastodon.socialL This user is from outside of this forum
                                  leorjorge@mastodon.social
                                  wrote last edited by
                                  #65

                                  @futurebird And that was about 2 years ago, when this kind of thing would probably be even worse. It took me half an hour to write code to generate the plots and some nice figures with the positions of every tree... I wonder how long they were fighting the chat box to get any kind of answer. Let alone the fact this experiment will be running for years to come. How can people be so careless? (2/2)

                                  orionkidder@mas.toO 1 Reply Last reply
                                  0
                                  • abyssalrook@mstdn.socialA abyssalrook@mstdn.social

                                    @futurebird The point is, having it appear once is something like a 94% chance. Seeing a 3-of-a-kind appear more than once is very much expected in a random distribution.

                                    But it's NOT what we EXPECT a random distribution to look like, from a human perspective. When people see things like that appear, they get nervous. If they're making a list to LOOK random, having 3 of the same number in a row starts to feel NOT random, like it's some kind of pattern, and so they won't do it much.

                                    abyssalrook@mstdn.socialA This user is from outside of this forum
                                    abyssalrook@mstdn.socialA This user is from outside of this forum
                                    abyssalrook@mstdn.social
                                    wrote last edited by
                                    #66

                                    @futurebird Also somehow I was wrong. Either I did my calculation wrong or that 8% chance really slipped through and I picked the absolutely wrong metric to judge this.

                                    Alternately, I didn't consider HOW the non-random list was made and just assumed it was just someone with a pencil picking numbers based purely on vibes, when there was just a different, non-random methodology.

                                    1 Reply Last reply
                                    0
                                    • leorjorge@mastodon.socialL leorjorge@mastodon.social

                                      @futurebird And that was about 2 years ago, when this kind of thing would probably be even worse. It took me half an hour to write code to generate the plots and some nice figures with the positions of every tree... I wonder how long they were fighting the chat box to get any kind of answer. Let alone the fact this experiment will be running for years to come. How can people be so careless? (2/2)

                                      orionkidder@mas.toO This user is from outside of this forum
                                      orionkidder@mas.toO This user is from outside of this forum
                                      orionkidder@mas.to
                                      wrote last edited by
                                      #67

                                      @LeoRJorge @futurebird Over and over again, if you know what you're doing, the LLM-generated version of it is so bad that doing it from scratch is easier and faster. Only people who don't know what they're doing, and usually people who sneer at learning to do something, really want to use LLMs. They think it's a cheat-code against acquiring skills, but it just makes them look lazy and uncaring. That's the owner-class dream, of course.

                                      1 Reply Last reply
                                      0
                                      • futurebird@sauropods.winF futurebird@sauropods.win

                                        There is something very creepy about the way LLMs willy cheerfully give lists of "random" numbers. But they aren't random in frequency, and as my students pointed out "it's probably from some webpage about how to generate random numbers"

                                        But even then, why is the frequency so unnaturally regular? Is that an artifact from mixing lists of real random numbers together?

                                        doctormo@floss.socialD This user is from outside of this forum
                                        doctormo@floss.socialD This user is from outside of this forum
                                        doctormo@floss.social
                                        wrote last edited by
                                        #68

                                        @futurebird

                                        Heh, it's xkcd 221 with more steps.

                                        1 Reply Last reply
                                        0
                                        • ldpm@wandering.shopL ldpm@wandering.shop

                                          @futurebird I know how to find the SD and I will use the php-stats library every day of the week and twice on Sunday. I would much rather be able to depend on well supported community code. (At least until it is all replaced by ai slop)

                                          sabik@rants.auS This user is from outside of this forum
                                          sabik@rants.auS This user is from outside of this forum
                                          sabik@rants.au
                                          wrote last edited by
                                          #69

                                          @ldpm @futurebird
                                          AIUI, there's also that the formulas for mean and especially stdev that we learn in school don't work great with the way we represent floating point numbers in computers, with the way rounding works with those, and hopefully the stats library uses more obscure formulas that take care of that, what they call "numerical stability"

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups