Had a lot of fun with my stats students today.

moira@mastodon.murkworks.net

@dpnash @futurebird @Bumblefish (and this is also when we all got into rolling our own random() implementations. based on proper principles, of course, we weren't inventing any. but!)

dpnash@c.im

@moira @futurebird @Bumblefish

Some months before I found the RNG patterns in the fake star charts (I was around 15 or so), I had the really bright idea of “hey, let’s take the RNG output for a chosen seed as a key stream for a cipher! That’ll be really hard to break, and it’ll only be about 10 lines of code!”

That was the first time I rolled my own crypto, and thanks to serendipitously strange-looking artificial star maps, it was also the last.

moira@mastodon.murkworks.net

@dpnash @futurebird @Bumblefish o noes xD

S'funny, none of us ever got into cryptography, at least not that I remember. Way more interested in getting _finding_ things than _hiding_ things, I think

vgarzareyna@mstdn.mx

@dlakelan @futurebird @Bumblefish another thing to look for could be frequency of pairs of numbers. for an unbiased, independent dice, there should be about a 1/36 chance of each pair of numbers to appear.

unfortunately you'd quite a large number of randomly generated samples to get this chance exactly, but i guess you could do some fancy statistics to analyze these distributions and try to guess which one is "more random looking"

vgarzareyna@mstdn.mx

@Bumblefish @futurebird (cryptographically-secure) hash functions are a textbook example of something that is not random (given the same input, it should always give the same output), but it's designed to look random (there should not be any way to get any amount of information about the input just from looking at/analyzing the output, even if you know how the function works)

mastokarl@mastodon.social

@futurebird Well, LLMs are tools. Know their limitations. Know their power.

In your case:

"create 20 random numbers between 1 and 100 by developing a little python app and running it"

Some day, AIs will respond to any prompt in a perfect way and we humans will be in deep shit.

Edit: LOL mistral.ai answers this prompt by generating the random numbers and THEN SORTING THEM. ‍️

poleguy@mastodon.social

@futurebird The trouble is that people can accept that "factual" output from an LLM may be statistically generated until they hit words that are generated that sound like "reasoning." Then even the most aware humans can get lulled into thinking that the words can be trusted.

gkrnours@mastodon.gamedev.place

@futurebird I assume from this post someone already mentioned statistics from the python standard library?

seachaint@masto.hackers.town

@futurebird there was a study that found that if you give an LLM some prompting to push it into a particular sampling-space (say, "bleeding heart leftie") and then ask it for some random numbers, you can then feed those numbers into another fresh instance and it'll drift towards the same sampling space.

In other words, even the numerical distributions they sample from can be connected to the broader "noosphere" they're trained on, and that relation is a fucked sort of bijection

seachaint@masto.hackers.town

@futurebird if you prompt it into "stats prof" or "crypto nerd" sampling space does it improve the quality of the fake RNG output?

david_chisnall@infosec.exchange

@futurebird @Bumblefish

It’s a trick question. Neither list is random because 7 is the most random number and does not appear in either list. A six-sided die is not able to produce a 7 and cannot therefore produce a random number.

- ChatGPT, probably.

tschfflr@fediscience.org

@futurebird @Bumblefish I vote for listB: I counted the times that two subsequent numbers are equal (1,1 or 4,4). In listA this occurs ~23 times so almost 1/4 of times, which seems too many (should be around 1/6). In listB it is ~9 times unless I missed some. Seems fewer than expected but anyway. If I’d spend more time I’d go for higher order ngrams

cstross@wandering.shop

@okohll @futurebird I was about to suggest Benford's Law too!

meuwese@mastodon.social

@ai6yr @ohmu @futurebird wait so... is that the ultimate question? "What number will an LLM always include when generating random numbers?"

life_is@no-pony.farm

@burnitdown@beige.party @futurebird@sauropods.win raNDOm. A play on words.

okohll@hachyderm.io

@cstross @futurebird God does play dice, but there’s a big lead weight in one side

thisalex@hachyderm.io

@futurebird
> what are we doing?

I think that the best description is, that we take part in a play. LLM makes its best effort to write how this dialogue could continue to look plausible for the reader. Choose your own adventure.

mildouze@mamot.fr

@futurebird @Bumblefish
B
(Random answer)

lamecarlate@pouet.it

@futurebird @Bumblefish I'm no stats student, so maybe I haven't the bases (for lack of a better term, English is not my main language), but I think listA is the random one. The fact that in the listB there is nearly no triplets seems too good to be true.

ingalovinde@embracing.space

@AbyssalRook @futurebird I see two mistakes in your reasoning.
One is technical: events "numbers with position N, N+1 and N+2 are the same" for different values of N are _not_ independent of each other. (For example, if we know that this statement is true for N=10, then there likelihood of it being true for N=11 is 1/6, not 1/36.)
Another symbolizes a deeper problem with a lot of modern research that relies heavily on p-values: consider how many statements of this kind, containing the same amount of information, could you make? Unless you commit to a specific statement beforehand, before seeing the data: "this statement would only be true in 8% of cases for truly random data" does not really mean anything if it's just one out of 20 equally "interesting" statements one could make about the data (e.g. "how many triplets of incrementing numbers (modulo six) are there", "how many decrementing triplets are there", etc), each only 8% likely. Because of course it is expected that for most random sequences, a few of these individually not very likely statements will be true.

CIRCLE WITH A DOT

Had a lot of fun with my stats students today.