Had a lot of fun with my stats students today.

dpiponi@mathstodon.xyz

@futurebird It's very weird.

In principle, if you take an LLM, you should be able to get it to generate random numbers in a way that reflects the numbers that appear in the corpus it was trained on. If you have the raw model you can probably do that.

But if you ask ChatGPT (or at least if I do) it starts talking about how numbers taken from around us typically follow Benford's law so their first digits have a logarithmic distribution. When it then spits out some random numbers it's no longer sampling random numbers from the entire corpus but a sample that's probably heavily biased towards numbers that appear in articles about Benford's law. I.e. what people have previously said about these numbers, rather than the actual numbers.

perigee@rage.love

@futurebird as others here have said or implied, I think LLMs are trained not to be random. Like as a structural part of the statistical models they're based on, so the input corpus will inform the "random" output.

Speaking as a long time not mathematically rigorous enough amateur cryptographer, most humans don't understand (not talking about you or your students, to be clear) that actually random can contain sequences and patterns, or parts of them, so when an uninformed human evaluates "randomness", they don't recognize sequences with patterns even if those are accidental coincidences.

Related, there's also the old cryptography parable that if a low ranking person in the security organization uses random picking to draw random numbers for, for example, a one time pad, the results won't really be random if that volunteer looks into the hat or drum from which they pick because they will subconsciously bias toward patterns like letter and number frequency from their experience and expectations, which might help an attacker decrypt the pad. Maybe.

Since the LLM is supposed to emulate human output it makes sense it might mess with "randomness".

futurebird@sauropods.win

@Bumblefish

Which one is random?
(data sets are 100 numbers 1 to 6)

listA=[2,3,5,1,2,2,4,2,4,5,2,3,3,4,5,6,4,2,6,2,2,1,3,4,5,5,6,3,3,6,1,4,2,1,4,5,2,2,3,3,3,5,6,3,2,4,5,5,1,1,1,6,1,4,3,5,5,3,1,1,1,6,1,4,6,6,3,6,6,2,4,4,4,5,1,5,6,2,6,1,1,2,4,2,2,3,4,4,5,6,1,3,3,3,5,4,6,5,1,6]

listB=[4,2,5,6,3,5,3,1,3,4,2,3,4,3,4,5,5,1,3,3,2,1,1,6,1,3,2,2,2,6,1,5,6,3,6,3,2,3,2,4,6,1,1,6,3,2,4,1,6,1,3,1,5,6,2,3,3,5,1,6,4,5,2,5,1,1,5,3,6,2,3,3,6,5,2,3,3,1,6,3,2,3,2,1,6,6,4,4,6,2,4,5,4,5,3,4,6,5,3,2]

noplasticshower@infosec.exchange

@futurebird @Bumblefish that question makes no sense

gatesvp@mstdn.ca

@futurebird I am reminded of a Doctor Who episode, where they realize they are in a simulation because they are incapable of generating truly random numbers. One scene has a whole bunch of scientists sitting at a table and they all keep yelling the same number at the same time.

digitalcalibrator@hol.ogra.ph

@dpiponi@mathstodon.xyz @burnitdown@beige.party @futurebird@sauropods.win and cloudflare famously uses a camera pointed at a wall of lava lamps because the motion is random

zalasur@mastodon.surazal.net

@futurebird @Bumblefish There's literally no way to say whether a list of numbers is random or not (1, 1, 1, 1, etc can plausibly be a random sequence for all we know), though you can establish likelihoods by looking at the distribution.

ramsey@phpc.social

@futurebird @Bumblefish The only way you could determine that something’s not random is if a pattern emerges in the data set. Even still, statistically, it is probable for a CSPRNG with good entropy to produce a random data set that looks like it’s not random—unlikely, but probable.

jedbrown@hachyderm.io

@dpiponi Even with a raw model, I don't see how you would sample from the distribution of numbers in the corpus. Perhaps provide no context and sample one or more tokens (using an independent pseudo-random number generator) from the distribution, and if the returned token parses as a number, return it to the user, otherwise try again. Providing any context/prompt would bias what is returned. This seems too contrived/circular.
@futurebird

futurebird@sauropods.win

@zalasur @Bumblefish

You *can* make an argument for one of these lists being random like a dice roll and the other being much less likely to be generated in that way.

futurebird@sauropods.win

@ramsey @Bumblefish

Only one of these lists could *plausibly* be from rolling dice.

ramsey@phpc.social

@futurebird @Bumblefish I have a UUID-generating library that, under certain conditions, could generate the same identical UUIDs because the CSPRNG it used ended up reusing the same entropy seed, unless the server was restarted. That was a *fun* bug to investigate and fix.

ramsey@phpc.social

@futurebird @Bumblefish Based on the statistical distribution of the dice rolls?

raederle@masto.nu

@futurebird @Bumblefish I like list A for random and list B for “planned random”.

dlakelan@mastodon.sdf.org

@futurebird
just to clarify what she means is as if from random unbiased 6 sided die rolls.

@Bumblefish

f_dion@mastodon.online

@futurebird the first episode of Numb3rs covered the appearance of randomness vs true randomness. I would not have remember that but watched a bunch of episodes to serve as math concept inspiration for the 31 music pieces I wrote and performed (on actual hardware synths) the whole month of January for #jamuary2026 #math #music #synths

Jamuary 2026

Listen to Jamuary 2026, a playlist curated by Francois Dion on desktop and mobile.

SoundCloud (soundcloud.com)

dlakelan@mastodon.sdf.org

@futurebird
things I would check are first the frequency of each number... they should be somewhat uniform but not TOO close to equal as all exactly equal is unlikely... next I'd look at the length of repeat sequences and compare to expected values.

the actual definition of random sequences (Per Martin-Löf) is in terms of passing tests actually
@Bumblefish

madjohnroberts@mastodon.social

@futurebird @Bumblefish listA has 17 occurrences of 1-4 and 16 of 5-6, where listB has different frequencies for each. I would guess that listB is actually random, listA is too nice.

apophis@yourwalls.today

@futurebird now i'm morbidly curious about what output it gave

...and, relatedly, whether asking it for random words would net a very high frequency of ninjas, monkeys and sporks...

apophis@yourwalls.today

@futurebird i'm guessing the second one is made up because there aren't enough triples?

@Bumblefish

CIRCLE WITH A DOT

Had a lot of fun with my stats students today.

Jamuary 2026