RIP burner accounts

nemo@mas.to

@dangoodin yeah pair that with this https://adbleed.eu/ 🥶

huitema@social.secret-wg.org

@dangoodin The article mentions 68% recall and 90% precision. Another way to state these numbers in 42% false negative and 10% false positive. This second number means 10% of the general population would be classified as "pseudonymous". Apply that for example to 1 billion Facebook account, and you get 100 million users wrongly flagged. That could be a problem!

yogthos@social.marxist.network

@dangoodin lol so you basically have to run your text through an LLM to anonymize your styel first

hyacin@social.linux.pizza

@dangoodin jus spel thigs rong an tipe difrent then nrml.

nemo@mas.to

@Ostrobothnia@toot.community @dangoodin

Good recommendation. I had read something similar in privacy guides. I use the following setup: I’ve compartmentalized my browser setup for daily browsing — I use Brave. As a second browser I use Mullvad, and my tertiary browser is a Tails + Tor combo; in that combo there are no country-specific flags anyway.

Still, your recommendation is also good. Maybe there is a technical solution — a friend said that in theory it would be easy to find a solution.

doomstrike@metalhead.club

@dangoodin
I wonder if something like this helps any
https://gibberifier.com/

phillip@social.lol

@huitema @dangoodin I’d also like to point out that the paper has a member of Anthropic listed as one of the authors. Anthropic has previously played up the effectiveness of their products in papers, before backtracking and providing more realistic details after the news has made its rounds. I’m skeptical of this paper at best

mo@mastodon.ml

@rpsu technically just written in a completely different style other than yours, but yeah, LLM is a fastest way to do that

Like, old school human criminalsts, given enough examples of text, could accurately estimate, if they were written by the same person

And LLMs are literally designed to encode all text nuances in comparable mathematical vectors, so they can do that even more accurately, and on a scale

@dangoodin

astropug@hachyderm.io

@dangoodin

I wonder whether the fact that different forums have different unspoken rules about the language they use might make cross identification more difficult. There are forums where has to be a bit…mean, almost, to survive trolls and others where the moderators take care of the trolls. It changes the language a lot.

I imagine even someone who writes LinkedIn poetry wouldn’t carry that style over to another forum.

But there are still other identifiers, such as preferences for outliers, etc. (“I hate chocolate, Cara oranges, and The Godfather”).

Either way, while I’ve always figured it would one day possible to advance in that direction woth more automation, (since humans can already kind of do it, too), it is very creepy and deeply unwanted.

astropug@hachyderm.io

@radio_alelopatia @dangoodin

Based on that one experiment they did where they identified 7% of users, I bet it would be used more as an initial attempt to identify someone, then see if those guesses include someone you’re looking for, etc.
It would lower the barrier for humans trying to unmask other humans- or go for low hanging fruit among the pseudonyms.

Maybe we should have regular talk like a pirate day to spike the data with some “argh matey”.

Edit: to make it clear, 7% of users is very, very little lol.

astropug@hachyderm.io

@radio_alelopatia @dangoodin

But yeah, I agree that they do like to oversell. It feels like these models are a bit like hammers in search of nails.

CIRCLE WITH A DOT

RIP burner accounts

LLMs can unmask pseudonymous users at scale with surprising accuracy

LLMs can unmask pseudonymous users at scale with surprising accuracy

LLMs can unmask pseudonymous users at scale with surprising accuracy

LLMs can unmask pseudonymous users at scale with surprising accuracy

LLMs can unmask pseudonymous users at scale with surprising accuracy

LLMs can unmask pseudonymous users at scale with surprising accuracy