diff --git a/blog/20210721-downvotes-considered-harmful.html b/blog/20210721-downvotes-considered-harmful.html new file mode 100644 index 0000000..06903e7 --- /dev/null +++ b/blog/20210721-downvotes-considered-harmful.html @@ -0,0 +1,91 @@ + + + +
+ + + + + + +Posted 2021-07-21.
+If you’re letting users rank content, you probably don’t need and don’t want downvotes. Here’s why.
+(This post inspired by news that Twitter is considering adding “Dislikes” to Tweets.)
+In my past life at Google, I was responsible for co-creating Memegen, a large & influential Google-internal social network. Memegen lets Google employees create internal-only memes and allows users to upvote & downvote the memes of others. Memegen’s home page is the Popular page, which shows the most-upvoted memes of the past day.
+Adding downvotes to Memegen was my single greatest mistake.
+Any voting system where most posts mostly receive upvotes, but also allows downvotes, has a huge problem:
+++No matter how you do the math, downvotes count more than upvotes do.
+
Mathematically, it will always be comparatively easy for a vocal minority to bury any specific items that they don’t want surfaced on the top-N posts page. This is true even if you’re using a sophisticated ranking algorithm like Wilson score intervals to rank posts (as Reddit & many other sites do).
+Downvotes aim to solve the problem of filtering out low-quality content, but are too easily coopted by trolls to let them filter out people — often for bad reasons that have more to do with the identity of who’s posting rather than the content of their posts.
+From the standpoint of attracting users, downvotes create another huge problem: someone whose first submission to a site gets downvoted to oblivion will feel bad about it and probably not come back to submit better stuff in the future.
+The other problem with downvotes is that it’s unclear to everyone what they mean. Does a downvote mean that this particular post is:
+As the creator of a social product, you need give people different buttons for these.
+Offensive or illegal posts (#1) shouldn’t be handled by an algorithmic rating system. You need actual human moderators for that — and enough of them that they can review those reports in a timely manner. (I hope you’re willing to train & pay them well!)
+For duplicate posts (#2) it’s nicer & more informative if your software simply says “hey, this submission is a duplicate of this other thing, why don’t you all check out that post instead?”
+#3 is solved by default — people can simply not vote for content they don’t like.
+#4 is pretty much the same as #3 (but maybe a moderator should intervene if a user has a history of posting too many off-topic things, or if it’s obviously spam).
+Once you’ve dispensed with the idea of downvotes, the main things a user cares about are: “what are the best things that have been posted today?” (or in the last hour / week / etc) or “what are the best things since I last visited?”
+On paper, the math is super simple: just count the number of upvotes for each item that was submitted in the relevant time period, and show the top N!
+It turns out that’s it’s actually a bit trickier to implement than something like a Wilson score interval, so here’s some tips on how to do that.
+We need to store each vote and when it was cast, and then when it’s time to compute the “most popular in the last day” page, you first select all the votes cast within the last day, and then count how many were for each post, and rank those.
+Doing this every time the user hits the homepage is clearly a terrible idea, so set up a cronjob to do it every 5 or 15 minutes or something. It’s okay if the info is slightly out of date! Most users won’t care or notice if it takes a few minutes for things to move around.
+How exactly to optimize this depends on the scale of your site, your storage architecture, a ton of other stuff, but for Memegen, every post had properties like score_hour
, score_day
, score_month
, score_alltime
. A mapreduce was responsible for updating these values every few minutes.
Obviously you don’t need to touch or compute anything for any post that got no votes since the last time you ran the updater. In the steady state, most of the posts in your system won’t need any update.
+Downvotes are a blunt instrument for users to say “I don’t like this content”.
+It’s easy for small groups of trolls to misuse downvotes as a vehicle for harassing & silencing groups of (often marginalized) people.
+Downvotes reduce engagement by scaring off first-time posters.
+Instead of adding downvotes to your site, build specific tools that handle specific kinds of unwanted posts.
+(This post is a distillation & refinement of some thoughts originally posted in a Twitter thread in September 2020.)
+Posted 2021-07-21.
+If you’re letting users rank content, you probably don’t need and don’t want downvotes. Here’s why.
+(This post inspired by news that Twitter is considering adding “Dislikes” to Tweets.)
+In my past life at Google, I was responsible for co-creating Memegen, a large & influential Google-internal social network. Memegen lets Google employees create internal-only memes and allows users to upvote & downvote the memes of others. Memegen’s home page is the Popular page, which shows the most-upvoted memes of the past day.
+Adding downvotes to Memegen was my single greatest mistake.
+Any voting system where most posts mostly receive upvotes, but also allows downvotes, has a huge problem:
+++No matter how you do the math, downvotes count more than upvotes do.
+
Mathematically, it will always be comparatively easy for a vocal minority to bury any specific items that they don’t want surfaced on the top-N posts page. This is true even if you’re using a sophisticated ranking algorithm like Wilson score intervals to rank posts (as Reddit & many other sites do).
+Downvotes aim to solve the problem of filtering out low-quality content, but are too easily coopted by trolls to let them filter out people — often for bad reasons that have more to do with the identity of who’s posting rather than the content of their posts.
+From the standpoint of attracting users, downvotes create another huge problem: someone whose first submission to a site gets downvoted to oblivion will feel bad about it and probably not come back to submit better stuff in the future.
+The other problem with downvotes is that it’s unclear to everyone what they mean. Does a downvote mean that this particular post is:
+As the creator of a social product, you need give people different buttons for these.
+Offensive or illegal posts (#1) shouldn’t be handled by an algorithmic rating system. You need actual human moderators for that — and enough of them that they can review those reports in a timely manner. (I hope you’re willing to train & pay them well!)
+For duplicate posts (#2) it’s nicer & more informative if your software simply says “hey, this submission is a duplicate of this other thing, why don’t you all check out that post instead?”
+#3 is solved by default — people can simply not vote for content they don’t like.
+#4 is pretty much the same as #3 (but maybe a moderator should intervene if a user has a history of posting too many off-topic things, or if it’s obviously spam).
+Once you’ve dispensed with the idea of downvotes, the main things a user cares about are: “what are the best things that have been posted today?” (or in the last hour / week / etc) or “what are the best things since I last visited?”
+On paper, the math is super simple: just count the number of upvotes for each item that was submitted in the relevant time period, and show the top N!
+It turns out that’s it’s actually a bit trickier to implement than something like a Wilson score interval, so here’s some tips on how to do that.
+We need to store each vote and when it was cast, and then when it’s time to compute the “most popular in the last day” page, you first select all the votes cast within the last day, and then count how many were for each post, and rank those.
+Doing this every time the user hits the homepage is clearly a terrible idea, so set up a cronjob to do it every 5 or 15 minutes or something. It’s okay if the info is slightly out of date! Most users won’t care or notice if it takes a few minutes for things to move around.
+How exactly to optimize this depends on the scale of your site, your storage architecture, a ton of other stuff, but for Memegen, every post had properties like score_hour
, score_day
, score_month
, score_alltime
. A mapreduce was responsible for updating these values every few minutes.
Obviously you don’t need to touch or compute anything for any post that got no votes since the last time you ran the updater. In the steady state, most of the posts in your system won’t need any update.
+Downvotes are a blunt instrument for users to say “I don’t like this content”.
+It’s easy for small groups of trolls to misuse downvotes as a vehicle for harassing & silencing groups of (often marginalized) people.
+Downvotes reduce engagement by scaring off first-time posters.
+Instead of adding downvotes to your site, build specific tools that handle specific kinds of unwanted posts.
+(This post is a distillation & refinement of some thoughts originally posted in a Twitter thread in September 2020.)
+ ]]> +Sign up for updates via my email newsletter or your favorite RSS reader.
Downvotes & Dislikes Considered Harmful, 2021-07-21.
+ If you’re letting users rank content, you probably don’t need and don’t want downvotes. Here’s why.
A new year & a sneaky new project, 2020-02-09.
A year after my last day at Google, some updates on making a videogame!