Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Had a lot of fun with my stats students today.

Had a lot of fun with my stats students today.

Scheduled Pinned Locked Moved Uncategorized
112 Posts 62 Posters 18 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • futurebird@sauropods.winF futurebird@sauropods.win

    The LLM is like a little box of computer horrors that we peer into from time to time.

    I'm sorry but the whole interface is just so silly.

    You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?

    leorjorge@mastodon.socialL This user is from outside of this forum
    leorjorge@mastodon.socialL This user is from outside of this forum
    leorjorge@mastodon.social
    wrote last edited by
    #64

    @futurebird the first time I had to go nuclear about LLM use in my department was when my boss was showing me her design for a major experiment where they were planting actual trees of different species in long term plots, and when I asked how did they randomise the distribution of species she said the post doc responsible for setting up the experiment had asked chatgpt to randomise it! (1/2)

    leorjorge@mastodon.socialL 1 Reply Last reply
    0
    • leorjorge@mastodon.socialL leorjorge@mastodon.social

      @futurebird the first time I had to go nuclear about LLM use in my department was when my boss was showing me her design for a major experiment where they were planting actual trees of different species in long term plots, and when I asked how did they randomise the distribution of species she said the post doc responsible for setting up the experiment had asked chatgpt to randomise it! (1/2)

      leorjorge@mastodon.socialL This user is from outside of this forum
      leorjorge@mastodon.socialL This user is from outside of this forum
      leorjorge@mastodon.social
      wrote last edited by
      #65

      @futurebird And that was about 2 years ago, when this kind of thing would probably be even worse. It took me half an hour to write code to generate the plots and some nice figures with the positions of every tree... I wonder how long they were fighting the chat box to get any kind of answer. Let alone the fact this experiment will be running for years to come. How can people be so careless? (2/2)

      orionkidder@mas.toO 1 Reply Last reply
      0
      • abyssalrook@mstdn.socialA abyssalrook@mstdn.social

        @futurebird The point is, having it appear once is something like a 94% chance. Seeing a 3-of-a-kind appear more than once is very much expected in a random distribution.

        But it's NOT what we EXPECT a random distribution to look like, from a human perspective. When people see things like that appear, they get nervous. If they're making a list to LOOK random, having 3 of the same number in a row starts to feel NOT random, like it's some kind of pattern, and so they won't do it much.

        abyssalrook@mstdn.socialA This user is from outside of this forum
        abyssalrook@mstdn.socialA This user is from outside of this forum
        abyssalrook@mstdn.social
        wrote last edited by
        #66

        @futurebird Also somehow I was wrong. Either I did my calculation wrong or that 8% chance really slipped through and I picked the absolutely wrong metric to judge this.

        Alternately, I didn't consider HOW the non-random list was made and just assumed it was just someone with a pencil picking numbers based purely on vibes, when there was just a different, non-random methodology.

        1 Reply Last reply
        0
        • leorjorge@mastodon.socialL leorjorge@mastodon.social

          @futurebird And that was about 2 years ago, when this kind of thing would probably be even worse. It took me half an hour to write code to generate the plots and some nice figures with the positions of every tree... I wonder how long they were fighting the chat box to get any kind of answer. Let alone the fact this experiment will be running for years to come. How can people be so careless? (2/2)

          orionkidder@mas.toO This user is from outside of this forum
          orionkidder@mas.toO This user is from outside of this forum
          orionkidder@mas.to
          wrote last edited by
          #67

          @LeoRJorge @futurebird Over and over again, if you know what you're doing, the LLM-generated version of it is so bad that doing it from scratch is easier and faster. Only people who don't know what they're doing, and usually people who sneer at learning to do something, really want to use LLMs. They think it's a cheat-code against acquiring skills, but it just makes them look lazy and uncaring. That's the owner-class dream, of course.

          1 Reply Last reply
          0
          • futurebird@sauropods.winF futurebird@sauropods.win

            There is something very creepy about the way LLMs willy cheerfully give lists of "random" numbers. But they aren't random in frequency, and as my students pointed out "it's probably from some webpage about how to generate random numbers"

            But even then, why is the frequency so unnaturally regular? Is that an artifact from mixing lists of real random numbers together?

            doctormo@floss.socialD This user is from outside of this forum
            doctormo@floss.socialD This user is from outside of this forum
            doctormo@floss.social
            wrote last edited by
            #68

            @futurebird

            Heh, it's xkcd 221 with more steps.

            1 Reply Last reply
            0
            • ldpm@wandering.shopL ldpm@wandering.shop

              @futurebird I know how to find the SD and I will use the php-stats library every day of the week and twice on Sunday. I would much rather be able to depend on well supported community code. (At least until it is all replaced by ai slop)

              sabik@rants.auS This user is from outside of this forum
              sabik@rants.auS This user is from outside of this forum
              sabik@rants.au
              wrote last edited by
              #69

              @ldpm @futurebird
              AIUI, there's also that the formulas for mean and especially stdev that we learn in school don't work great with the way we represent floating point numbers in computers, with the way rounding works with those, and hopefully the stats library uses more obscure formulas that take care of that, what they call "numerical stability"

              1 Reply Last reply
              0
              • futurebird@sauropods.winF futurebird@sauropods.win

                There is something very creepy about the way LLMs willy cheerfully give lists of "random" numbers. But they aren't random in frequency, and as my students pointed out "it's probably from some webpage about how to generate random numbers"

                But even then, why is the frequency so unnaturally regular? Is that an artifact from mixing lists of real random numbers together?

                mastokarl@mastodon.socialM This user is from outside of this forum
                mastokarl@mastodon.socialM This user is from outside of this forum
                mastokarl@mastodon.social
                wrote last edited by
                #70

                @futurebird @benroyce Noticed that a while ago, too. Found it interesting that they work a lot like humans (who are totally unable to create random numbers - there are even fraud detection systems basing on that human flaw).

                1 Reply Last reply
                0
                • futurebird@sauropods.winF futurebird@sauropods.win

                  @Bumblefish

                  Which one is random?
                  (data sets are 100 numbers 1 to 6)

                  listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

                  listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

                  ricosuave@mastodon.onlineR This user is from outside of this forum
                  ricosuave@mastodon.onlineR This user is from outside of this forum
                  ricosuave@mastodon.online
                  wrote last edited by
                  #71

                  @futurebird @Bumblefish Without any careful analysis, just winging it here, but the double occurrence of 1 6 1 4 in List A makes it sus to me. Especially since John Napier published his outline of logarithms in 1614. Coincidence? I think not!!

                  1 Reply Last reply
                  0
                  • futurebird@sauropods.winF futurebird@sauropods.win

                    ListA was created by making a list of 16 or 17 of each number. The Stdev **of the frequencies** is much lower than what you will find on random lists of similar size.

                    ListB was made by rolling dice.

                    2something@transfem.social2 This user is from outside of this forum
                    2something@transfem.social2 This user is from outside of this forum
                    2something@transfem.social
                    wrote last edited by
                    #72

                    @futurebird@sauropods.win @charette@mstdn.ca I hope you are proud of your students for getting it:)

                    1 Reply Last reply
                    0
                    • moira@mastodon.murkworks.netM moira@mastodon.murkworks.net

                      @futurebird @Bumblefish Heh, this reminds me of something from school where... Evan? Somebody. made a plot of outputs from the system's (pseudo-)random number generator and turns out there some _very visible_ patterns. Like, obvious visible stripes in the number selection density plot.

                      #maths

                      dpnash@c.imD This user is from outside of this forum
                      dpnash@c.imD This user is from outside of this forum
                      dpnash@c.im
                      wrote last edited by
                      #73

                      @moira @futurebird @Bumblefish RANDU!

                      That's a blast from the past (already obsolete by the time I started fiddling with computers many years ago).

                      Link Preview Image
                      RANDU - Wikipedia

                      favicon

                      (en.wikipedia.org)

                      I never used a system with RANDU installed, but I did discover that the PRNGs in old BASICs from the 1980s had the same basic flaw, and I found it in the nerdiest way possible: trying to draw artificial star charts with plausible distributions of star brightnesses, noticing there were some *really funky* patterns in the resulting "constellations", and eventually discovering they had the same mathematical properties that RANDU had (in some cases, worse).

                      moira@mastodon.murkworks.netM 1 Reply Last reply
                      0
                      • dpnash@c.imD dpnash@c.im

                        @moira @futurebird @Bumblefish RANDU!

                        That's a blast from the past (already obsolete by the time I started fiddling with computers many years ago).

                        Link Preview Image
                        RANDU - Wikipedia

                        favicon

                        (en.wikipedia.org)

                        I never used a system with RANDU installed, but I did discover that the PRNGs in old BASICs from the 1980s had the same basic flaw, and I found it in the nerdiest way possible: trying to draw artificial star charts with plausible distributions of star brightnesses, noticing there were some *really funky* patterns in the resulting "constellations", and eventually discovering they had the same mathematical properties that RANDU had (in some cases, worse).

                        moira@mastodon.murkworks.netM This user is from outside of this forum
                        moira@mastodon.murkworks.netM This user is from outside of this forum
                        moira@mastodon.murkworks.net
                        wrote last edited by
                        #74

                        @dpnash @futurebird @Bumblefish omg

                        that's it

                        tilted to the right instead of the left

                        that's what he found 😄

                        moira@mastodon.murkworks.netM 1 Reply Last reply
                        0
                        • moira@mastodon.murkworks.netM moira@mastodon.murkworks.net

                          @dpnash @futurebird @Bumblefish omg

                          that's it

                          tilted to the right instead of the left

                          that's what he found 😄

                          moira@mastodon.murkworks.netM This user is from outside of this forum
                          moira@mastodon.murkworks.netM This user is from outside of this forum
                          moira@mastodon.murkworks.net
                          wrote last edited by
                          #75

                          @dpnash @futurebird @Bumblefish (and this is also when we all got into rolling our own random() implementations. based on proper principles, of course, we weren't inventing any. but!)

                          dpnash@c.imD 1 Reply Last reply
                          0
                          • moira@mastodon.murkworks.netM moira@mastodon.murkworks.net

                            @dpnash @futurebird @Bumblefish (and this is also when we all got into rolling our own random() implementations. based on proper principles, of course, we weren't inventing any. but!)

                            dpnash@c.imD This user is from outside of this forum
                            dpnash@c.imD This user is from outside of this forum
                            dpnash@c.im
                            wrote last edited by
                            #76

                            @moira @futurebird @Bumblefish

                            Some months before I found the RNG patterns in the fake star charts (I was around 15 or so), I had the really bright idea of “hey, let’s take the RNG output for a chosen seed as a key stream for a cipher! That’ll be really hard to break, and it’ll only be about 10 lines of code!”

                            That was the first time I rolled my own crypto, and thanks to serendipitously strange-looking artificial star maps, it was also the last.

                            moira@mastodon.murkworks.netM 1 Reply Last reply
                            0
                            • dpnash@c.imD dpnash@c.im

                              @moira @futurebird @Bumblefish

                              Some months before I found the RNG patterns in the fake star charts (I was around 15 or so), I had the really bright idea of “hey, let’s take the RNG output for a chosen seed as a key stream for a cipher! That’ll be really hard to break, and it’ll only be about 10 lines of code!”

                              That was the first time I rolled my own crypto, and thanks to serendipitously strange-looking artificial star maps, it was also the last.

                              moira@mastodon.murkworks.netM This user is from outside of this forum
                              moira@mastodon.murkworks.netM This user is from outside of this forum
                              moira@mastodon.murkworks.net
                              wrote last edited by
                              #77

                              @dpnash @futurebird @Bumblefish o noes xD

                              S'funny, none of us ever got into cryptography, at least not that I remember. Way more interested in getting _finding_ things than _hiding_ things, I think

                              1 Reply Last reply
                              0
                              • dlakelan@mastodon.sdf.orgD dlakelan@mastodon.sdf.org

                                @futurebird
                                things I would check are first the frequency of each number... they should be somewhat uniform but not TOO close to equal as all exactly equal is unlikely... next I'd look at the length of repeat sequences and compare to expected values.

                                the actual definition of random sequences (Per Martin-Löf) is in terms of passing tests actually
                                @Bumblefish

                                vgarzareyna@mstdn.mxV This user is from outside of this forum
                                vgarzareyna@mstdn.mxV This user is from outside of this forum
                                vgarzareyna@mstdn.mx
                                wrote last edited by
                                #78

                                @dlakelan @futurebird @Bumblefish another thing to look for could be frequency of pairs of numbers. for an unbiased, independent dice, there should be about a 1/36 chance of each pair of numbers to appear.

                                unfortunately you'd quite a large number of randomly generated samples to get this chance exactly, but i guess you could do some fancy statistics to analyze these distributions and try to guess which one is "more random looking"

                                1 Reply Last reply
                                0
                                • vgarzareyna@mstdn.mxV This user is from outside of this forum
                                  vgarzareyna@mstdn.mxV This user is from outside of this forum
                                  vgarzareyna@mstdn.mx
                                  wrote last edited by
                                  #79

                                  @Bumblefish @futurebird (cryptographically-secure) hash functions are a textbook example of something that is not random (given the same input, it should always give the same output), but it's designed to look random (there should not be any way to get any amount of information about the input just from looking at/analyzing the output, even if you know how the function works)

                                  1 Reply Last reply
                                  0
                                  • futurebird@sauropods.winF futurebird@sauropods.win

                                    The LLM is like a little box of computer horrors that we peer into from time to time.

                                    I'm sorry but the whole interface is just so silly.

                                    You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?

                                    mastokarl@mastodon.socialM This user is from outside of this forum
                                    mastokarl@mastodon.socialM This user is from outside of this forum
                                    mastokarl@mastodon.social
                                    wrote last edited by
                                    #80

                                    @futurebird Well, LLMs are tools. Know their limitations. Know their power.

                                    In your case:

                                    "create 20 random numbers between 1 and 100 by developing a little python app and running it"

                                    Some day, AIs will respond to any prompt in a perfect way and we humans will be in deep shit.

                                    Edit: LOL mistral.ai answers this prompt by generating the random numbers and THEN SORTING THEM. 🤦‍♂️

                                    1 Reply Last reply
                                    0
                                    • futurebird@sauropods.winF futurebird@sauropods.win

                                      The LLM is like a little box of computer horrors that we peer into from time to time.

                                      I'm sorry but the whole interface is just so silly.

                                      You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?

                                      poleguy@mastodon.socialP This user is from outside of this forum
                                      poleguy@mastodon.socialP This user is from outside of this forum
                                      poleguy@mastodon.social
                                      wrote last edited by
                                      #81

                                      @futurebird The trouble is that people can accept that "factual" output from an LLM may be statistically generated until they hit words that are generated that sound like "reasoning." Then even the most aware humans can get lulled into thinking that the words can be trusted.

                                      1 Reply Last reply
                                      0
                                      • futurebird@sauropods.winF futurebird@sauropods.win

                                        "Why don't you just load a library to find the mean and SD?"

                                        Because I'M OLD. I like to write my own function. I do it for integration sometimes... kids these days.

                                        gkrnours@mastodon.gamedev.placeG This user is from outside of this forum
                                        gkrnours@mastodon.gamedev.placeG This user is from outside of this forum
                                        gkrnours@mastodon.gamedev.place
                                        wrote last edited by
                                        #82

                                        @futurebird I assume from this post someone already mentioned statistics from the python standard library?

                                        1 Reply Last reply
                                        0
                                        • futurebird@sauropods.winF futurebird@sauropods.win

                                          The LLM is like a little box of computer horrors that we peer into from time to time.

                                          I'm sorry but the whole interface is just so silly.

                                          You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?

                                          seachaint@masto.hackers.townS This user is from outside of this forum
                                          seachaint@masto.hackers.townS This user is from outside of this forum
                                          seachaint@masto.hackers.town
                                          wrote last edited by
                                          #83

                                          @futurebird there was a study that found that if you give an LLM some prompting to push it into a particular sampling-space (say, "bleeding heart leftie") and then ask it for some random numbers, you can then feed those numbers into another fresh instance and it'll drift towards the same sampling space.

                                          In other words, even the numerical distributions they sample from can be connected to the broader "noosphere" they're trained on, and that relation is a fucked sort of bijection

                                          seachaint@masto.hackers.townS 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups