Had a lot of fun with my stats students today.
-
@moira @futurebird @Bumblefish RANDU!
That's a blast from the past (already obsolete by the time I started fiddling with computers many years ago).
I never used a system with RANDU installed, but I did discover that the PRNGs in old BASICs from the 1980s had the same basic flaw, and I found it in the nerdiest way possible: trying to draw artificial star charts with plausible distributions of star brightnesses, noticing there were some *really funky* patterns in the resulting "constellations", and eventually discovering they had the same mathematical properties that RANDU had (in some cases, worse).
@dpnash @futurebird @Bumblefish omg
that's it
tilted to the right instead of the left
that's what he found

-
@dpnash @futurebird @Bumblefish omg
that's it
tilted to the right instead of the left
that's what he found

@dpnash @futurebird @Bumblefish (and this is also when we all got into rolling our own random() implementations. based on proper principles, of course, we weren't inventing any. but!)
-
@dpnash @futurebird @Bumblefish (and this is also when we all got into rolling our own random() implementations. based on proper principles, of course, we weren't inventing any. but!)
@moira @futurebird @Bumblefish
Some months before I found the RNG patterns in the fake star charts (I was around 15 or so), I had the really bright idea of “hey, let’s take the RNG output for a chosen seed as a key stream for a cipher! That’ll be really hard to break, and it’ll only be about 10 lines of code!”
That was the first time I rolled my own crypto, and thanks to serendipitously strange-looking artificial star maps, it was also the last.
-
@moira @futurebird @Bumblefish
Some months before I found the RNG patterns in the fake star charts (I was around 15 or so), I had the really bright idea of “hey, let’s take the RNG output for a chosen seed as a key stream for a cipher! That’ll be really hard to break, and it’ll only be about 10 lines of code!”
That was the first time I rolled my own crypto, and thanks to serendipitously strange-looking artificial star maps, it was also the last.
@dpnash @futurebird @Bumblefish o noes xD
S'funny, none of us ever got into cryptography, at least not that I remember. Way more interested in getting _finding_ things than _hiding_ things, I think
-
@futurebird
things I would check are first the frequency of each number... they should be somewhat uniform but not TOO close to equal as all exactly equal is unlikely... next I'd look at the length of repeat sequences and compare to expected values.the actual definition of random sequences (Per Martin-Löf) is in terms of passing tests actually
@Bumblefish@dlakelan @futurebird @Bumblefish another thing to look for could be frequency of pairs of numbers. for an unbiased, independent dice, there should be about a 1/36 chance of each pair of numbers to appear.
unfortunately you'd quite a large number of randomly generated samples to get this chance exactly, but i guess you could do some fancy statistics to analyze these distributions and try to guess which one is "more random looking"
-
@Bumblefish @futurebird (cryptographically-secure) hash functions are a textbook example of something that is not random (given the same input, it should always give the same output), but it's designed to look random (there should not be any way to get any amount of information about the input just from looking at/analyzing the output, even if you know how the function works)
-
The LLM is like a little box of computer horrors that we peer into from time to time.
I'm sorry but the whole interface is just so silly.
You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?
@futurebird Well, LLMs are tools. Know their limitations. Know their power.
In your case:
"create 20 random numbers between 1 and 100 by developing a little python app and running it"
Some day, AIs will respond to any prompt in a perfect way and we humans will be in deep shit.
Edit: LOL mistral.ai answers this prompt by generating the random numbers and THEN SORTING THEM.
️ -
The LLM is like a little box of computer horrors that we peer into from time to time.
I'm sorry but the whole interface is just so silly.
You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?
@futurebird The trouble is that people can accept that "factual" output from an LLM may be statistically generated until they hit words that are generated that sound like "reasoning." Then even the most aware humans can get lulled into thinking that the words can be trusted.
-
"Why don't you just load a library to find the mean and SD?"
Because I'M OLD. I like to write my own function. I do it for integration sometimes... kids these days.
@futurebird I assume from this post someone already mentioned statistics from the python standard library?
-
The LLM is like a little box of computer horrors that we peer into from time to time.
I'm sorry but the whole interface is just so silly.
You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?
@futurebird there was a study that found that if you give an LLM some prompting to push it into a particular sampling-space (say, "bleeding heart leftie") and then ask it for some random numbers, you can then feed those numbers into another fresh instance and it'll drift towards the same sampling space.
In other words, even the numerical distributions they sample from can be connected to the broader "noosphere" they're trained on, and that relation is a fucked sort of bijection
-
@futurebird there was a study that found that if you give an LLM some prompting to push it into a particular sampling-space (say, "bleeding heart leftie") and then ask it for some random numbers, you can then feed those numbers into another fresh instance and it'll drift towards the same sampling space.
In other words, even the numerical distributions they sample from can be connected to the broader "noosphere" they're trained on, and that relation is a fucked sort of bijection
@futurebird
if you prompt it into "stats prof" or "crypto nerd" sampling space does it improve the quality of the fake RNG output? -
Which one is random?
(data sets are 100 numbers 1 to 6)listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]
listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]
It’s a trick question. Neither list is random because 7 is the most random number and does not appear in either list. A six-sided die is not able to produce a 7 and cannot therefore produce a random number.
- ChatGPT, probably.
-
Which one is random?
(data sets are 100 numbers 1 to 6)listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]
listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]
@futurebird @Bumblefish I vote for listB: I counted the times that two subsequent numbers are equal (1,1 or 4,4). In listA this occurs ~23 times so almost 1/4 of times, which seems too many (should be around 1/6). In listB it is ~9 times unless I missed some. Seems fewer than expected but anyway. If I’d spend more time I’d go for higher order ngrams
-
@futurebird haven't tried it but maybe it's also all mixed up with non-random numbers in training content e.g. the next number after '20' is likely one of 0, 1 or 2, the start of a 21st century year so far. Or Benford's law https://en.wikipedia.org/wiki/Benford%27s_law
@okohll @futurebird I was about to suggest Benford's Law too!
-
@ohmu @futurebird LOL 42 and 73 are my picks for "random" numbers out of the LLMs, for now.
@ai6yr @ohmu @futurebird wait so... is that the ultimate question? "What number will an LLM always include when generating random numbers?"
-
@Life_is @futurebird that's still the contents of RAM, whatever an NDO is.
@burnitdown@beige.party @futurebird@sauropods.win raNDOm. A play on words. -
@okohll @futurebird I was about to suggest Benford's Law too!
@cstross @futurebird God does play dice, but there’s a big lead weight in one side
-
The LLM is like a little box of computer horrors that we peer into from time to time.
I'm sorry but the whole interface is just so silly.
You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?
@futurebird
> what are we doing?I think that the best description is, that we take part in a play. LLM makes its best effort to write how this dialogue could continue to look plausible for the reader. Choose your own adventure.
-
Which one is random?
(data sets are 100 numbers 1 to 6)listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]
listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]
@futurebird @Bumblefish
B
(Random answer)
-
Which one is random?
(data sets are 100 numbers 1 to 6)listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]
listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]
@futurebird @Bumblefish I'm no stats student, so maybe I haven't the bases (for lack of a better term, English is not my main language), but I think listA is the random one. The fact that in the listB there is nearly no triplets seems too good to be true.