Had a lot of fun with my stats students today.

jmax@mastodon.social

@futurebird Faster than finding a library and RTFM too.

ohmu@social.seattle.wa.us

@futurebird
When I was a kid, we solved integrals in the snow and rain uphill in both directions.

ai6yr@m.ai6yr.org

@ohmu @futurebird LOL 42 and 73 are my picks for "random" numbers out of the LLMs, for now.

ldpm@wandering.shop

@futurebird I know how to find the SD and I will use the php-stats library every day of the week and twice on Sunday. I would much rather be able to depend on well supported community code. (At least until it is all replaced by ai slop)

perigee@rage.love

@futurebird early in my physical chemistry researcher fellowship I had to write an algorithm to do a Levenberg-Marquardt least squares curve fitting algorithm to an 18 to 28 parameter optical curve (that used double precision complex numbers). I did the first pass implementation in FORTRAN and then needed my postdoc's help to transform the algorithm to matrix algebra in a Matlab implementation. It was fascinating.

futurebird@sauropods.win

There is something very creepy about the way LLMs willy cheerfully give lists of "random" numbers. But they aren't random in frequency, and as my students pointed out "it's probably from some webpage about how to generate random numbers"

But even then, why is the frequency so unnaturally regular? Is that an artifact from mixing lists of real random numbers together?

darkling@mstdn.social

@futurebird I think I've got a printed book of random numbers upstairs somewhere.

phpete@mastodon.coffee

@futurebird
Related: the subconscious power of human brains is amazing.

burnitdown@beige.party

@futurebird

FUN FACT: random ain't random. especially in computers.

if you ask for "random" output from a computer, there is no guarantee that what comes out isn't actually from the contents of RAM.

darkling@mstdn.social

@flipper @futurebird I definitely have some of those. Several, in fact, at various levels of precision and different sets of functions.

life_is@no-pony.farm

@burnitdown@beige.party You need to put some NDO into the ram. @futurebird@sauropods.win

okohll@hachyderm.io

@futurebird haven't tried it but maybe it's also all mixed up with non-random numbers in training content e.g. the next number after '20' is likely one of 0, 1 or 2, the start of a 21st century year so far. Or Benford's law https://en.wikipedia.org/wiki/Benford%27s_law

burnitdown@beige.party

@Life_is @futurebird that's still the contents of RAM, whatever an NDO is.

geepawhill@mastodon.social

@futurebird As you so often do, you sent me off on a tangent. My favorite PRNG is in Knuth, and it's called Algorithm A there. It is entirely additive, so very fast, and has a period of 2^54.

I spent *years* tryna to figure out why nobody ever used it or even mentioned it.

Finally discovered that it has another name, and that it is quite frequently used today.

I have, of course, completely forgotten its other name, which somebody here on fedi actually told me.

dpiponi@mathstodon.xyz

@burnitdown @futurebird These days if you really want random numbers you can have them. Eg. RDRAND on Intel chips is seeded by analogue circuitry, not by some state updated in RAM. And even if you don't use RDRAND directly its output is still used as a source of entropy for other generators.

mcc@mastodon.social

@futurebird i mean the LLM itself is just a statistical distribution… the path through the distribution is i assume randomized, but the distribution itself is gonna be the same every time.

futurebird@sauropods.win

The LLM is like a little box of computer horrors that we peer into from time to time.

I'm sorry but the whole interface is just so silly.

You ask for random numbers with sentences and it pretends to give them to you? What are we doooooing?

grumpasaurus@infosec.exchange

@futurebird it really puts into perspective what my interaction with real people is like

dpiponi@mathstodon.xyz

@futurebird It's very weird.

In principle, if you take an LLM, you should be able to get it to generate random numbers in a way that reflects the numbers that appear in the corpus it was trained on. If you have the raw model you can probably do that.

But if you ask ChatGPT (or at least if I do) it starts talking about how numbers taken from around us typically follow Benford's law so their first digits have a logarithmic distribution. When it then spits out some random numbers it's no longer sampling random numbers from the entire corpus but a sample that's probably heavily biased towards numbers that appear in articles about Benford's law. I.e. what people have previously said about these numbers, rather than the actual numbers.

perigee@rage.love

@futurebird as others here have said or implied, I think LLMs are trained not to be random. Like as a structural part of the statistical models they're based on, so the input corpus will inform the "random" output.

Speaking as a long time not mathematically rigorous enough amateur cryptographer, most humans don't understand (not talking about you or your students, to be clear) that actually random can contain sequences and patterns, or parts of them, so when an uninformed human evaluates "randomness", they don't recognize sequences with patterns even if those are accidental coincidences.

Related, there's also the old cryptography parable that if a low ranking person in the security organization uses random picking to draw random numbers for, for example, a one time pad, the results won't really be random if that volunteer looks into the hat or drum from which they pick because they will subconsciously bias toward patterns like letter and number frequency from their experience and expectations, which might help an attacker decrypt the pad. Maybe.

Since the LLM is supposed to emulate human output it makes sense it might mess with "randomness".

CIRCLE WITH A DOT

Had a lot of fun with my stats students today.