Had a lot of fun with my stats students today.

flockofcats@famichiki.jp

@Bumblefish @futurebird
That was an interesting thread. Our brains are wired to think certain things are “random” when they’re not, so when people try to create something that looks random, they often avoid repeated numbers, even though there’d be repeats, if truly random, with some expected frequency. Also, odd numbers are often overrepresented cuz they feel more random, e.g., 5973 vs 6084. This “ looks random, but isn’t” often comes up when people fabricate scientific data

abyssalrook@mstdn.social

@IngaLovinde As for the latter, that is entirely true from a research perspective, but I picked the 3-of-a-kind pattern because I assumed the non-random list was entirely human constructed, and that particular pattern is one that sticks out to us the most. Someone making a list by hand is more likely to see "6-6-6" as less random than "6-1-2" or "3-4-5".

I did not clock 'Which is random?' as one being a dice roll and the other being a shuffled deck of prescribed cards.

fsologureng@chilemasto.casa

@futurebird listA has the subsequence 1,1,1,6,1,4 repeated twice at very short distance between them, which is, while plausible, extremely improbable. That's the way I found it's crafted.

demfighter@mas.to

@futurebird In essence, an LLM is nothing more than a glorified and dumbed down search engine.

Instead of producing a set of hyperlinks like a normal search engine would, the algorithm takes excerpts from the sources with the highest "relevance" value. The output is formatted to look like pseudo-speech for no apparent reason.

The end result is never better than the traditional search results, which may or may not be useful. The only thing the LLMs are good at is wasting electricity.

ingalovinde@embracing.space

@AbyssalRook okay let's calculate it:
Let a_n be the probability that the sequence of length n does not contain triplets of identical numbers, and does not end with two same numbers; b_n, the same, but ends with two same numbers.
Then a_1 = 1, a_2 = 5/6, b_2 = 1/6; a_(n+1) = a_n * 5/6 + b_n * 5/6; b_(n+1) = a_n * 1/6.
Or, expanding b_n, we get a_(n+2) = a_(n+1) * 5/6 + a_n * 5/36.
Plugging these numbers into Wolfram alpha (`LinearRecurrence[{5/6, 5/36}, {1, 5/6}, 100]`), we obtain a_100 ~= 0.0762866, a_99 ~= 0.0781878, and therefore the probability that the sequence of 100 random numbers does not contain triplets of the same number is a_100 + a_99/6 ~= 0.0893 = 8.93%.

By contrast, the probability that out of 98 random (and independent) triplets none will consist of three same numbers is (35/36)^98 ~= 6.32%.

That's a pretty large difference, and not just a jiggle.

(I understand that this is not the number you were looking at, but it's the easiest way to illustrate that there is a significant difference between answering questions about triplets of repeating number among 98 independent random triplets and among 98 sub-triplets of the sequence with 100 independent random numbers.)

ai6yr@m.ai6yr.org

@meuwese @ohmu @futurebird Apparently humans have willed that into existence, yes. LOL. (err... Douglas Adams, precisely)

ldpm@wandering.shop

@futurebird @ramsey @Bumblefish this is not remotely my area of expertise but I am interested in the answer. My guess would be that the list that looks more evenly distributed is the fake one, and therefore List A is the "actually random" one because it has more seemingly outlying subsets, like a whole bunch of 1s in rapid succession.

There are tons of ways to unevenly distribute but relatively few ways to evenly distribute, so the one that seems less even is more likely to be true

ldpm@wandering.shop

@futurebird @ramsey @Bumblefish also I suspect maybe a Monty Hall kind of thing where you generated a bunch of random lists, and then selected the one that looked least random to you to trick your students.

I'd love to know what the actual answer is and what you were hoping to teach your students!

raffzahn@mastodon.bayern

@futurebird

"What are we doooooing?"

Well, we've taken the sound algorithm of a brabbling baby, supercharged by a huge library of words annotated by possibility of sequence and now management is jumping around like parents bragging what a genius their 11 month old is. All because WE try to find meaning in the perceived word sequence.

Same management that brags about 1400% lower prices :))

raffzahn@mastodon.bayern

@dpiponi @futurebird

Which in turn is what LLM do. They give an averaged output, not a reasoned.

In addition the inherent laws of measurement and control define that any reached output will never met the intended. Thus LLM output will never increase knowledge, but migrate toward zero.

petabites@mastodon.world

@futurebird

and how about those "random" passwords generated by AI

https://zeroes.ca/@kimcrawley/116099905667994600

* over and over, again. #PasswordReuse #VibeSlop

futurebird@sauropods.win

@petabites

This is what inspired the whole lesson. I had to show them this.

futurebird@sauropods.win

@ldpm @ramsey @Bumblefish

I put the answer in the original thread with a CW. This was about frequency.

lamecarlate@pouet.it

@futurebird @Bumblefish Yep, I read it… My bad. I used instinct, guts, not mathematics like the other answers. I should have

futurebird@sauropods.win

@ldpm

I don't mind using libraries, but it's fun to write my own versions of things just so I know how they work.

When we make projects where we share code I encourage them to use libraries more often. I'm just a grumpy old lady about it sometimes.

CIRCLE WITH A DOT

Had a lot of fun with my stats students today.