You only need 3 votes to play, and other facts about the Hacker News frontpage
Like many people, I check Hacker News religiously everyday for interesting tidbits of information. About two years ago, HN finally opened up an official API for different aspects of the site. Many people have since grabbed the full dump of HN and put it up on various sites, and there is even a public dataset on big query that contains the information.
For me, out of all the data available on HN API, the most interesting endpoint is the live updated top 500 posts at any given time. Over the past year, I have collected about 46 million items in a Postgresql database.
The end point sends the data tick by tick (as far as I can tell). It means that whenever a submission changes, say its position changed from 2 to 1, or its point total changed from 56 to 57, the API will notify all of the participants of the fact. Sometimes the API is down or has intermittent problems, but most of the time it is fairly close.
In this post, we will be looking at the data from 2015-03-01 to 2016-03-01, EST timezone. By the end, I hope that we can come to some intuition of when to post to Hacker News for maximum visibility.
Note, the dataset I used for this analysis is available at the end of the post for download.
In order to figure out when to post to HN, we will be looking at the following attributes:
- How many items are posted per hour (affects how long each item stays in the “New” submission front page)
- How many people visit the site (affects how many votes we need and how fast the frontpage cycles)
- How many submissions per hour ultimately end up on the frontpage (how competitive is the race overall)
- How long each post stays on frontpage (the longer you stay on frontpage, the more attention/hits you get)
- How many votes do each post needs before reaching frontpage (the lower the number of required votes, the easier it is to get them)
The following graph shows attribute 1, 2, 3:
X-axis represents the pair
day of the week, hour of day, where
(0,0) means it’s
(monday, 12 midnight), and
(friday, 4am eastern). Y-axis represents volume, in the case of green line, it’s volume of votes per hour, and red line is volume of new submissions per hour.
As we can see, there are normally many more votes at any given time than new submissions. Which makes sense since voting up is a less involved process than actually finding an interesting article, submitting it vs just clicking a button on a submission you like.
The blue line represents how many articles submitted per hour that ultimately make it to top 30. Which compared to how many articles are submitted, is a negligible amount. Let’s put it on its own scale and see what the graph looks like.
Maximum 9 submissions per hour on average ultimately make it to top 30 spot (frontpage).
Raw numbers are interesting, but it’s merely interesting in a “nice to know” way. In order to gain insight into when it is a good time to submit, we need to normalize the numbers by a baseline. In this case, the most interpretable baseline is probably new submissions per hour. The following graph shows votes/submission and top30/submission:
Based on how likely a post submitted in a certain hour is to make it to frontpage, it seems that the best dow/tod combination is Saturday/7-10am, with a just above 3.5% chance of making it. If you were submitting on a weekday, probably midnight to 3am is your best bet. For a more sane hour, the second best time is 7-8am on a weekday.
Does low period submission last until high period?
One of the questions I ask myself is how does the median time that a submission stay on the frontpage compare if you submit during low volume (but higher probability that it will reach the frontpage) times. Ideally, it has significantly more staying power due to low recycle rate and can reach at least part of the high volume time.
From the graph above, it seems that half of the posts gets at least 350 minutes (~6 hours!) of exposure on the frontpage if you submit it during the lowest time, roughly just past midnight. If we are optimists, and we have one of the best articles, it is conceivable that if we submit at say 2am, and it might stay until 8 or 9, which is just about reaching the highest point of volume.
If we believe the number of votes a post has gotten to be a proxy for how many people have clicked through and read the article. Then the following graph puts the median number of vote an submission gets while on the frontpage for various submission times.
Based on this graph, it is a terrible time to submit at midnight if you are trying to get more votes/views. However, the good news is that the number of votes gotten while on frontpage is peaking a couple of hours earlier than the peak volume of submission/votes. As such, we can probably submit at 5-6 instead of 8-9 to get the most amounts of votes/views.
How many points/minutes do we need before hitting the frontpage?
The last question I would like to explore today is how many points do we need to get in median before hitting frontpage. The other question is how many minutes generally do we have to wait before we should expect to hit the frontpage.
The graph above shows the median minutes before a post gets to the frontpage.
The median points needed for frontpage oscillates between 3 and 4 points, with no discernible pattern based on day of the week and time of day.
From the frontpage probability graph, it looks like that at 5 points, the probability that a submission will reach the frontpage peaks at just over 35% chance. If you get more than 5 points and have not reached the frontpage yet, it means something is probably wrong and the probability decreases as a result.
Note, for posts reaching frontpage, the point count is the point at which I have record it breaking through to frontpage. For posts not reaching the frontpage, I did not record the point progression, so I used the point of the submission at the time of writing. It could be possible that a submission gets more points over time, and as such it does not reflect the point total hours after submission.
It seems based on the analysis above, that the best time to submit to hacker news would be on a weekend morning eastern time. You have the best chance of making it to the frontpage and have a good balance of how long it sticks around and how many views to get.
If you missed the weekend window and don’t want to wait for the next one. On a weekday, the best time is before everyone gets to work, make it onto the frontpage, and stay there for the morning rush of visitors.
This post is inspired by Todd Schneider’s excellent analysis of reddit’s frontpage data.
This post is generated from the position status dataset per item. Tweet me a link if you use this dataset to generate interesting results!
If you like this, please subscribe below or follow me on twitter