An interesting problem to reason through.
-
An interesting problem to reason through. I run a very old online forum (since 2002 or so). We have about 20K users registered. I'm doing a bit of housekeeping and I'm trying to figure out whether I should delete some users. I have records like last activity (last time someone was using the site logged in). I know how many posts every account has ever made. So I can come up with stats like this:
- Number of users whose last activity was 2016 (10 years ago) and who have posted 0 times: 380.
- Number of users whose last activity was 2017 and who have posted no more than twice: 7961.
If I delete these users, it will orphan their posts. The posts will still be present, but the author will just say "system." Deleting 7961 inactive users who have posted 2 or fewer will orphan 9395 posts out of 1.7M posts (about 0.5%). Obviously all the orphaned posts are also 10+ years old, but sometimes old posts are a big deal.
I'm trying to come up with some method of reasoning about it that helps me decide where to draw a cut line.
I did a pivot table and a pivot graph. I did 4 thresholds 0, 1, 2, and 3. That is users have posted no more than 0, 1, 2, or 3 posts. And I looked at last activity from 2002 to 2026. I think this graph is neat.
The years are along the X axis, grouped by threshold. The Y axis is number of users and number of posts that would be affected. Purple bars are number of users deleted, orange bars are number of posts that would be author-less.
Obviously if I were to pick users whose last activity was in the last few years, I'd delete tons of users. That makes no sense.
Not pictured is what happens if I ignore the threshold of max posts. If I, for example, deleted every user whose last activity was 10 years ago, I'd delete 13800 of our 20K users. It would orphan a huge chunk of the site's posts. Some users were really active a long time ago. One person made over 1000 posts over the course of 6 years, but their last login was more than 10 years ago.
This is a community of people with spinal cord injury and/or traumatic brain injury and their loved ones. The life expectancy in this user base is much lower than average. People stop posting for a lot of very sad reasons. Deleting the account of a deceased, but well-known and well-loved user is bad, because it would anonymise all their posts; it would hurt the community for no real gain. But there's no user like that who only posted 3 times. Thus the threshold and the care about orphaning posts.
I'm interested in other people's thoughts. I don't have a ton of experience running communities like this.
-
An interesting problem to reason through. I run a very old online forum (since 2002 or so). We have about 20K users registered. I'm doing a bit of housekeeping and I'm trying to figure out whether I should delete some users. I have records like last activity (last time someone was using the site logged in). I know how many posts every account has ever made. So I can come up with stats like this:
- Number of users whose last activity was 2016 (10 years ago) and who have posted 0 times: 380.
- Number of users whose last activity was 2017 and who have posted no more than twice: 7961.
If I delete these users, it will orphan their posts. The posts will still be present, but the author will just say "system." Deleting 7961 inactive users who have posted 2 or fewer will orphan 9395 posts out of 1.7M posts (about 0.5%). Obviously all the orphaned posts are also 10+ years old, but sometimes old posts are a big deal.
I'm trying to come up with some method of reasoning about it that helps me decide where to draw a cut line.
I did a pivot table and a pivot graph. I did 4 thresholds 0, 1, 2, and 3. That is users have posted no more than 0, 1, 2, or 3 posts. And I looked at last activity from 2002 to 2026. I think this graph is neat.
The years are along the X axis, grouped by threshold. The Y axis is number of users and number of posts that would be affected. Purple bars are number of users deleted, orange bars are number of posts that would be author-less.
Obviously if I were to pick users whose last activity was in the last few years, I'd delete tons of users. That makes no sense.
Not pictured is what happens if I ignore the threshold of max posts. If I, for example, deleted every user whose last activity was 10 years ago, I'd delete 13800 of our 20K users. It would orphan a huge chunk of the site's posts. Some users were really active a long time ago. One person made over 1000 posts over the course of 6 years, but their last login was more than 10 years ago.
This is a community of people with spinal cord injury and/or traumatic brain injury and their loved ones. The life expectancy in this user base is much lower than average. People stop posting for a lot of very sad reasons. Deleting the account of a deceased, but well-known and well-loved user is bad, because it would anonymise all their posts; it would hurt the community for no real gain. But there's no user like that who only posted 3 times. Thus the threshold and the care about orphaning posts.
I'm interested in other people's thoughts. I don't have a ton of experience running communities like this.
@paco
Why delete? If you're worried about account takeovers just reset the password to random. -
@paco
Why delete? If you're worried about account takeovers just reset the password to random.@FritzAdalis The main reasons to delete are:
- Liability. There's personal info and that personal info is my personal responsibility. So if I keep these instead of deleting, I need to purge their profiles of real data.
- Performance. I run on so-so hardware. I sorta view pruning some of this detritus as a way to free up capacity for real users. I'm about to upgrade to discourse, which will be a much nicer experience, and then we will want to encourage active memberships. When I say "capacity" I don't mean storage. I just mean performance. Like a database of 20K users with 13K defunct has a bunch of bloated indexes and spends longer searching because it wades through unnecessary stuff.
I also plan to prune some of the 1.7M posts. For example, there's a 'for sale' forum where 5000 posts are 10+ years old and have like 1 reply. There's not really any community value to those.
It's mostly just trying to clean up flotsam and jetsam.
-
An interesting problem to reason through. I run a very old online forum (since 2002 or so). We have about 20K users registered. I'm doing a bit of housekeeping and I'm trying to figure out whether I should delete some users. I have records like last activity (last time someone was using the site logged in). I know how many posts every account has ever made. So I can come up with stats like this:
- Number of users whose last activity was 2016 (10 years ago) and who have posted 0 times: 380.
- Number of users whose last activity was 2017 and who have posted no more than twice: 7961.
If I delete these users, it will orphan their posts. The posts will still be present, but the author will just say "system." Deleting 7961 inactive users who have posted 2 or fewer will orphan 9395 posts out of 1.7M posts (about 0.5%). Obviously all the orphaned posts are also 10+ years old, but sometimes old posts are a big deal.
I'm trying to come up with some method of reasoning about it that helps me decide where to draw a cut line.
I did a pivot table and a pivot graph. I did 4 thresholds 0, 1, 2, and 3. That is users have posted no more than 0, 1, 2, or 3 posts. And I looked at last activity from 2002 to 2026. I think this graph is neat.
The years are along the X axis, grouped by threshold. The Y axis is number of users and number of posts that would be affected. Purple bars are number of users deleted, orange bars are number of posts that would be author-less.
Obviously if I were to pick users whose last activity was in the last few years, I'd delete tons of users. That makes no sense.
Not pictured is what happens if I ignore the threshold of max posts. If I, for example, deleted every user whose last activity was 10 years ago, I'd delete 13800 of our 20K users. It would orphan a huge chunk of the site's posts. Some users were really active a long time ago. One person made over 1000 posts over the course of 6 years, but their last login was more than 10 years ago.
This is a community of people with spinal cord injury and/or traumatic brain injury and their loved ones. The life expectancy in this user base is much lower than average. People stop posting for a lot of very sad reasons. Deleting the account of a deceased, but well-known and well-loved user is bad, because it would anonymise all their posts; it would hurt the community for no real gain. But there's no user like that who only posted 3 times. Thus the threshold and the care about orphaning posts.
I'm interested in other people's thoughts. I don't have a ton of experience running communities like this.
@paco why delete the users?
You might be able to lock their passwords and require account recovery if they come back, if it's security you worry about.
You might even update the email address to be some nonce if you're willing to just let them re-register if they come back, and don't want a list of old email addresses in the system.Run a cron that scrambles/locks passwords after one year, email/handles after five?
-
@paco why delete the users?
You might be able to lock their passwords and require account recovery if they come back, if it's security you worry about.
You might even update the email address to be some nonce if you're willing to just let them re-register if they come back, and don't want a list of old email addresses in the system.Run a cron that scrambles/locks passwords after one year, email/handles after five?
@StompyRobot Write a rails cron job for all this? Nah. The funny thing is that this is all just an itch. I always think data I have that I don’t need is only liability. But everyone I have asked has asked “why delete?”
I did point out that people on this forum die fairly regularly. I think, compared to other forums, the likelihood that they haven’t come back because they are deceased is much higher. So the chances of them coming back is lower than average.
So saving their legacy is valuable. But for people that barely ever did anything and haven’t been back in over 10 years? Why even keep it?
-
R relay@relay.infosec.exchange shared this topic