Browse Source

add "downvotes considered harmful"

main
Colin McMillen 3 months ago
parent
commit
8bc2f445fb
  1. 72
      content/blog/20210721-downvotes-considered-harmful.md

72
content/blog/20210721-downvotes-considered-harmful.md

@ -0,0 +1,72 @@
# Downvotes & Dislikes Considered Harmful
*Posted 2021-07-21.*
If you're letting users rank content, you probably **don't need and don't want downvotes**. Here's why.
(This post inspired by news that Twitter is considering [adding "Dislikes" to Tweets](https://twitter.com/Sadcrib/status/1417913362999136257).)
## Background
In my past life at Google, I was responsible for co-creating [Memegen](https://books.google.com/books?id=fEJ0AwAAQBAJ&newbks=1&newbks_redir=0&lpg=PP83&dq=memegen%20eric%20schmidt&pg=PP83#v=onepage&q=memegen%20eric%20schmidt&f=false), a large & influential Google-internal social network. Memegen lets Google employees create internal-only memes and allows users to upvote & downvote the memes of others. Memegen's home page is the Popular page, which shows the most-upvoted memes of the past day.
Adding downvotes to Memegen was my single greatest mistake.
## The problems of downvotes
Any voting system where *most* posts mostly receive upvotes, but also allows downvotes, has a huge problem:
> No matter how you do the math, **downvotes count more** than upvotes do.
Mathematically, it will always be comparatively easy for a vocal minority to bury any specific items that they don't want surfaced on the top-N posts page. This is true even if you're using a sophisticated ranking algorithm like [Wilson score intervals](https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval) to rank posts (as Reddit & many other sites do).
Downvotes aim to solve the problem of filtering out low-quality **content**, but are too easily coopted by trolls to let them filter out **people** --- often for bad reasons that have more to do with the identity of who's posting rather than the content of their posts.
From the standpoint of attracting users, downvotes create another huge problem: someone whose first submission to a site gets downvoted to oblivion will feel bad about it and probably not come back to submit better stuff in the future.
## What does a downvote actually *mean?*
The other problem with downvotes is that it's unclear to everyone what they mean. Does a downvote mean that this particular post is:
1. offensive or illegal and needs to be removed ASAP?
2. a duplicate?
3. just something you personally don't like?
4. off-topic for the forum?
As the creator of a social product, you need **give people different buttons** for these.
Offensive or illegal posts (#1) shouldn't be handled by an algorithmic rating system. You need actual human moderators for that --- and enough of them that they can review those reports in a timely manner. (I hope you're willing to train & pay them well!)
For duplicate posts (#2) it's nicer & more informative if your software simply says "hey, this submission is a duplicate of this other thing, why don't you all check out that post instead?"
\#3 is solved by default --- people can simply not vote for content they don't like.
\#4 is pretty much the same as #3 (but maybe a moderator should intervene if a user has a history of posting too many off-topic things, or if it's obviously spam).
## How to actually rank posts
Once you've dispensed with the idea of downvotes, the main things a user cares about are: "what are the best things that have been posted today?" (or in the last hour / week / etc) or "what are the best things since I last visited?"
On paper, the math is super simple: just count the number of upvotes for each item that was submitted in the relevant time period, and show the top N!
It turns out that's it's actually a bit trickier to implement than something like a Wilson score interval, so here's some tips on how to do that.
We need to store each vote and when it was cast, and then when it's time to compute the "most popular in the last day" page, you first select all the votes cast within the last day, and then count how many were for each post, and rank those.
Doing this every time the user hits the homepage is clearly a terrible idea, so set up a cronjob to do it every 5 or 15 minutes or something. It's okay if the info is slightly out of date! Most users won't care or notice if it takes a few minutes for things to move around.
How exactly to optimize this depends on the scale of your site, your storage architecture, a ton of other stuff, but for Memegen, every post had properties like `score_hour`, `score_day`, `score_month`, `score_alltime`. A mapreduce was responsible for updating these values every few minutes.
Obviously you don't need to touch or compute anything for any post that got no votes since the last time you ran the updater. In the steady state, *most* of the posts in your system won't need any update.
## Conclusion
Downvotes are a blunt instrument for users to say "I don't like this content".
It's easy for small groups of trolls to misuse downvotes as a vehicle for harassing & silencing groups of (often marginalized) people.
Downvotes reduce engagement by scaring off first-time posters.
Instead of adding downvotes to your site, build *specific* tools that handle specific kinds of unwanted posts.
(This post is a distillation & refinement of some thoughts originally posted in [a Twitter thread](https://twitter.com/mcmillen/status/1310998579184574465?s=20) in September 2020.)
Loading…
Cancel
Save