automated update from build.py

This commit is contained in:
Colin McMillen 2021-07-21 20:49:03 -04:00
parent d14e5f6d46
commit b8a07d03a4
4 changed files with 150 additions and 1 deletions

View File

@ -0,0 +1,91 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="icon" type="image/png" href="/favicon.ico">
<link rel="canonical" href="https://www.mcmillen.dev/blog/20210721-downvotes-considered-harmful.html">
<link rel="alternate" type="application/atom+xml" href="https://www.mcmillen.dev/feed.atom" title="Colin McMillen's Blog - Atom">
<title>Downvotes & Dislikes Considered Harmful | Colin McMillen</title>
<link rel="preconnect" href="https://fonts.gstatic.com">
<link href="https://fonts.googleapis.com/css2?family=Quicksand:wght@500;700&display=block" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Fira+Mono:500&display=block" rel="stylesheet">
<link rel="stylesheet" href="/pygments.css">
<link rel="stylesheet" href="/style.css">
<meta name="twitter:card" content="summary">
<meta name="twitter:site" content="@mcmillen">
<meta name="twitter:title" content="Downvotes & Dislikes Considered Harmful | Colin McMillen">
<meta name="twitter:description" content="If youre letting users rank content, you probably dont need and dont want downvotes. Heres why. (This post inspired by news that Twitter is considering adding “Dislikes” to Tweets.)">
</head>
<body>
<div id="page-container">
<div id="content-wrap">
<div id="header">
<div class="content">
<a href="/" class="undecorated">Colin McMillen</a>
<span style="float: right;"><a href="/feed.atom"><img src="/img/rss.svg" alt="Atom feed" style="width: 17px; height: 17px; margin-bottom: 1px;"></a></span>
<span style="float: right;"><a href="https://twitter.com/mcmillen"><img src="/img/twitter.svg" alt="@mcmillen"></a></span>
</div>
</div>
<div class="content">
<h1 id="downvotes-dislikes-considered-harmful">Downvotes &amp; Dislikes Considered Harmful</h1>
<p><em>Posted 2021-07-21.</em></p>
<p>If you&rsquo;re letting users rank content, you probably <strong>don&rsquo;t need and don&rsquo;t want downvotes</strong>. Here&rsquo;s why.</p>
<p>(This post inspired by news that Twitter is considering <a href="https://twitter.com/Sadcrib/status/1417913362999136257">adding &ldquo;Dislikes&rdquo; to Tweets</a>.)</p>
<h2 id="background">Background</h2>
<p>In my past life at Google, I was responsible for co-creating <a href="https://books.google.com/books?id=fEJ0AwAAQBAJ&amp;newbks=1&amp;newbks_redir=0&amp;lpg=PP83&amp;dq=memegen%20eric%20schmidt&amp;pg=PP83#v=onepage&amp;q=memegen%20eric%20schmidt&amp;f=false">Memegen</a>, a large &amp; influential Google-internal social network. Memegen lets Google employees create internal-only memes and allows users to upvote &amp; downvote the memes of others. Memegen&rsquo;s home page is the Popular page, which shows the most-upvoted memes of the past day.</p>
<p>Adding downvotes to Memegen was my single greatest mistake.</p>
<h2 id="the-problems-of-downvotes">The problems of downvotes</h2>
<p>Any voting system where <em>most</em> posts mostly receive upvotes, but also allows downvotes, has a huge problem:</p>
<blockquote>
<p>No matter how you do the math, <strong>downvotes count more</strong> than upvotes do.</p>
</blockquote>
<p>Mathematically, it will always be comparatively easy for a vocal minority to bury any specific items that they don&rsquo;t want surfaced on the top-N posts page. This is true even if you&rsquo;re using a sophisticated ranking algorithm like <a href="https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval">Wilson score intervals</a> to rank posts (as Reddit &amp; many other sites do).</p>
<p>Downvotes aim to solve the problem of filtering out low-quality <strong>content</strong>, but are too easily coopted by trolls to let them filter out <strong>people</strong> &mdash; often for bad reasons that have more to do with the identity of who&rsquo;s posting rather than the content of their posts.</p>
<p>From the standpoint of attracting users, downvotes create another huge problem: someone whose first submission to a site gets downvoted to oblivion will feel bad about it and probably not come back to submit better stuff in the future.</p>
<h2 id="what-does-a-downvote-actually-mean">What does a downvote actually <em>mean?</em></h2>
<p>The other problem with downvotes is that it&rsquo;s unclear to everyone what they mean. Does a downvote mean that this particular post is:</p>
<ol>
<li>offensive or illegal and needs to be removed ASAP?</li>
<li>a duplicate?</li>
<li>just something you personally don&rsquo;t like?</li>
<li>off-topic for the forum?</li>
</ol>
<p>As the creator of a social product, you need <strong>give people different buttons</strong> for these.</p>
<p>Offensive or illegal posts (#1) shouldn&rsquo;t be handled by an algorithmic rating system. You need actual human moderators for that &mdash; and enough of them that they can review those reports in a timely manner. (I hope you&rsquo;re willing to train &amp; pay them well!)</p>
<p>For duplicate posts (#2) it&rsquo;s nicer &amp; more informative if your software simply says &ldquo;hey, this submission is a duplicate of this other thing, why don&rsquo;t you all check out that post instead?&rdquo;</p>
<p>#3 is solved by default &mdash; people can simply not vote for content they don&rsquo;t like.</p>
<p>#4 is pretty much the same as #3 (but maybe a moderator should intervene if a user has a history of posting too many off-topic things, or if it&rsquo;s obviously spam).</p>
<h2 id="how-to-actually-rank-posts">How to actually rank posts</h2>
<p>Once you&rsquo;ve dispensed with the idea of downvotes, the main things a user cares about are: &ldquo;what are the best things that have been posted today?&rdquo; (or in the last hour / week / etc) or &ldquo;what are the best things since I last visited?&rdquo;</p>
<p>On paper, the math is super simple: just count the number of upvotes for each item that was submitted in the relevant time period, and show the top N!</p>
<p>It turns out that&rsquo;s it&rsquo;s actually a bit trickier to implement than something like a Wilson score interval, so here&rsquo;s some tips on how to do that.</p>
<p>We need to store each vote and when it was cast, and then when it&rsquo;s time to compute the &ldquo;most popular in the last day&rdquo; page, you first select all the votes cast within the last day, and then count how many were for each post, and rank those.</p>
<p>Doing this every time the user hits the homepage is clearly a terrible idea, so set up a cronjob to do it every 5 or 15 minutes or something. It&rsquo;s okay if the info is slightly out of date! Most users won&rsquo;t care or notice if it takes a few minutes for things to move around.</p>
<p>How exactly to optimize this depends on the scale of your site, your storage architecture, a ton of other stuff, but for Memegen, every post had properties like <code>score_hour</code>, <code>score_day</code>, <code>score_month</code>, <code>score_alltime</code>. A mapreduce was responsible for updating these values every few minutes.</p>
<p>Obviously you don&rsquo;t need to touch or compute anything for any post that got no votes since the last time you ran the updater. In the steady state, <em>most</em> of the posts in your system won&rsquo;t need any update.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Downvotes are a blunt instrument for users to say &ldquo;I don&rsquo;t like this content&rdquo;.</p>
<p>It&rsquo;s easy for small groups of trolls to misuse downvotes as a vehicle for harassing &amp; silencing groups of (often marginalized) people.</p>
<p>Downvotes reduce engagement by scaring off first-time posters.</p>
<p>Instead of adding downvotes to your site, build <em>specific</em> tools that handle specific kinds of unwanted posts.</p>
<p>(This post is a distillation &amp; refinement of some thoughts originally posted in <a href="https://twitter.com/mcmillen/status/1310998579184574465?s=20">a Twitter thread</a> in September 2020.)</p>
</div>
</div>
<div id="footer">
<div class="content">
&copy; 2021 <a href="/" class="undecorated">Colin McMillen</a>. No cookies, no tracking.
</div>
</div>
</div>
</body>
</html>

View File

@ -4,7 +4,7 @@
<title>Colin McMillen's Blog</title>
<link href="https://www.mcmillen.dev"/>
<link rel="self" href="https://www.mcmillen.dev/feed.atom"/>
<updated>2021-07-09T12:14:04-04:00</updated>
<updated>2021-07-21T20:49:01-04:00</updated>
<author>
<name>Colin McMillen</name>
</author>
@ -432,4 +432,57 @@ PJQNOS+CMSY10 Type 1 yes yes no 12 0
<updated>2020-02-09T12:00:00-04:00</updated>
</entry>
<entry>
<title>Downvotes &amp; Dislikes Considered Harmful</title>
<id>https://www.mcmillen.dev/blog/20210721-downvotes-considered-harmful.html</id>
<link rel="alternate" href="https://www.mcmillen.dev/blog/20210721-downvotes-considered-harmful.html"/>
<content type="html">
<![CDATA[
<h1 id="downvotes-dislikes-considered-harmful">Downvotes &amp; Dislikes Considered Harmful</h1>
<p><em>Posted 2021-07-21.</em></p>
<p>If you&rsquo;re letting users rank content, you probably <strong>don&rsquo;t need and don&rsquo;t want downvotes</strong>. Here&rsquo;s why.</p>
<p>(This post inspired by news that Twitter is considering <a href="https://twitter.com/Sadcrib/status/1417913362999136257">adding &ldquo;Dislikes&rdquo; to Tweets</a>.)</p>
<h2 id="background">Background</h2>
<p>In my past life at Google, I was responsible for co-creating <a href="https://books.google.com/books?id=fEJ0AwAAQBAJ&amp;newbks=1&amp;newbks_redir=0&amp;lpg=PP83&amp;dq=memegen%20eric%20schmidt&amp;pg=PP83#v=onepage&amp;q=memegen%20eric%20schmidt&amp;f=false">Memegen</a>, a large &amp; influential Google-internal social network. Memegen lets Google employees create internal-only memes and allows users to upvote &amp; downvote the memes of others. Memegen&rsquo;s home page is the Popular page, which shows the most-upvoted memes of the past day.</p>
<p>Adding downvotes to Memegen was my single greatest mistake.</p>
<h2 id="the-problems-of-downvotes">The problems of downvotes</h2>
<p>Any voting system where <em>most</em> posts mostly receive upvotes, but also allows downvotes, has a huge problem:</p>
<blockquote>
<p>No matter how you do the math, <strong>downvotes count more</strong> than upvotes do.</p>
</blockquote>
<p>Mathematically, it will always be comparatively easy for a vocal minority to bury any specific items that they don&rsquo;t want surfaced on the top-N posts page. This is true even if you&rsquo;re using a sophisticated ranking algorithm like <a href="https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval">Wilson score intervals</a> to rank posts (as Reddit &amp; many other sites do).</p>
<p>Downvotes aim to solve the problem of filtering out low-quality <strong>content</strong>, but are too easily coopted by trolls to let them filter out <strong>people</strong> &mdash; often for bad reasons that have more to do with the identity of who&rsquo;s posting rather than the content of their posts.</p>
<p>From the standpoint of attracting users, downvotes create another huge problem: someone whose first submission to a site gets downvoted to oblivion will feel bad about it and probably not come back to submit better stuff in the future.</p>
<h2 id="what-does-a-downvote-actually-mean">What does a downvote actually <em>mean?</em></h2>
<p>The other problem with downvotes is that it&rsquo;s unclear to everyone what they mean. Does a downvote mean that this particular post is:</p>
<ol>
<li>offensive or illegal and needs to be removed ASAP?</li>
<li>a duplicate?</li>
<li>just something you personally don&rsquo;t like?</li>
<li>off-topic for the forum?</li>
</ol>
<p>As the creator of a social product, you need <strong>give people different buttons</strong> for these.</p>
<p>Offensive or illegal posts (#1) shouldn&rsquo;t be handled by an algorithmic rating system. You need actual human moderators for that &mdash; and enough of them that they can review those reports in a timely manner. (I hope you&rsquo;re willing to train &amp; pay them well!)</p>
<p>For duplicate posts (#2) it&rsquo;s nicer &amp; more informative if your software simply says &ldquo;hey, this submission is a duplicate of this other thing, why don&rsquo;t you all check out that post instead?&rdquo;</p>
<p>#3 is solved by default &mdash; people can simply not vote for content they don&rsquo;t like.</p>
<p>#4 is pretty much the same as #3 (but maybe a moderator should intervene if a user has a history of posting too many off-topic things, or if it&rsquo;s obviously spam).</p>
<h2 id="how-to-actually-rank-posts">How to actually rank posts</h2>
<p>Once you&rsquo;ve dispensed with the idea of downvotes, the main things a user cares about are: &ldquo;what are the best things that have been posted today?&rdquo; (or in the last hour / week / etc) or &ldquo;what are the best things since I last visited?&rdquo;</p>
<p>On paper, the math is super simple: just count the number of upvotes for each item that was submitted in the relevant time period, and show the top N!</p>
<p>It turns out that&rsquo;s it&rsquo;s actually a bit trickier to implement than something like a Wilson score interval, so here&rsquo;s some tips on how to do that.</p>
<p>We need to store each vote and when it was cast, and then when it&rsquo;s time to compute the &ldquo;most popular in the last day&rdquo; page, you first select all the votes cast within the last day, and then count how many were for each post, and rank those.</p>
<p>Doing this every time the user hits the homepage is clearly a terrible idea, so set up a cronjob to do it every 5 or 15 minutes or something. It&rsquo;s okay if the info is slightly out of date! Most users won&rsquo;t care or notice if it takes a few minutes for things to move around.</p>
<p>How exactly to optimize this depends on the scale of your site, your storage architecture, a ton of other stuff, but for Memegen, every post had properties like <code>score_hour</code>, <code>score_day</code>, <code>score_month</code>, <code>score_alltime</code>. A mapreduce was responsible for updating these values every few minutes.</p>
<p>Obviously you don&rsquo;t need to touch or compute anything for any post that got no votes since the last time you ran the updater. In the steady state, <em>most</em> of the posts in your system won&rsquo;t need any update.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Downvotes are a blunt instrument for users to say &ldquo;I don&rsquo;t like this content&rdquo;.</p>
<p>It&rsquo;s easy for small groups of trolls to misuse downvotes as a vehicle for harassing &amp; silencing groups of (often marginalized) people.</p>
<p>Downvotes reduce engagement by scaring off first-time posters.</p>
<p>Instead of adding downvotes to your site, build <em>specific</em> tools that handle specific kinds of unwanted posts.</p>
<p>(This post is a distillation &amp; refinement of some thoughts originally posted in <a href="https://twitter.com/mcmillen/status/1310998579184574465?s=20">a Twitter thread</a> in September 2020.)</p>
]]>
</content>
<updated>2021-07-21T12:00:00-04:00</updated>
</entry>
</feed>

View File

@ -67,6 +67,10 @@ Previously at Google, reCAPTCHA, &amp; Carnegie Mellon.</p>
<p>Sign up for updates via my <a href="https://tinyletter.com/mcmillen">email newsletter</a> or your favorite <a href="/feed.atom">RSS reader</a>.</p>
<ul>
<li>
<p><a href="blog/20210721-downvotes-considered-harmful.html">Downvotes &amp; Dislikes Considered Harmful</a>, 2021-07-21.<br>
If you&rsquo;re letting users rank content, you probably don&rsquo;t need and don&rsquo;t want downvotes. Here&rsquo;s why.</p>
</li>
<li>
<p><a href="blog/20200209-sneak.html">A new year &amp; a sneaky new project</a>, 2020-02-09.<br>
A year after my last day at Google, some updates on making a videogame!</p>
</li>

View File

@ -5,6 +5,7 @@ https://www.mcmillen.dev/blog/20070522-latex-tips.html
https://www.mcmillen.dev/blog/20070807-vim-tips.html
https://www.mcmillen.dev/blog/20190403-update.html
https://www.mcmillen.dev/blog/20200209-sneak.html
https://www.mcmillen.dev/blog/20210721-downvotes-considered-harmful.html
https://www.mcmillen.dev/index.html
https://www.mcmillen.dev/language_checklist.html
https://www.mcmillen.dev/papers/Drenner-2002-ICRA-final.pdf