The real top Stack Overflow questions

By Felipe Hoffa

Interactive dashboard. Play with it to find the top questions for any tag.

Let’s see first what questions have received the largest number of pageviews through time:

The top Stack Overflow questions of all time. What’s missing?

Some observations:

  • The top Stack Overflow question of all time — with more than 7 million views since its creation 9 years ago — is not even a programming question: “How do I undo the most recent commits in Git”.
  • From the top 10 questions, 4 are related to git, 3 to JavaScript, 1 to Java, 1 to Linux, and 1 to HTML. What’s missing?
  • There are no Python questions in the top 10 list. Surprised?

So let’s take a look at the top 10 question — not from all time, but from this last quarter. This is how the Stack Overflow world looks now:

The top Stack Overflow questions of 2018Q4. What tag is missing now?

Some observations:

  • The top question shows that people are still trying to figure out “How do I undo the most recent commits in Git” — with more than 400k pageviews this last quarter.
  • The programming world has changed: Now 4 of the top 10 questions are related to Python.
  • Java is gone — it doesn’t show up within the top 10 questions any more.

The trend is even more evident if you look at the tags for the top 30 questions:

Top 10 and top 30 Stack Overflow questions. Current vs all time

Java and SQL are gone from the top 30. Meanwhile Python now dominates the top 10 and top 30 Stack Overflow questions — by current pageviews.

Now you can look at what the top questions for any tag are. Before we take a look at them — can you guess the top questions for JavaScript, Python, Go?

Top 10 questions Q42018 for Python
Top 10 questions Q42018 for JavaScript
Top 10 questions Q42018 for Go

The first thing you can notice with this dashboard are the top current 10 questions for any tag, and how they’ve trended through time. Python devs keep asking about “Iterating over dictionaries using ‘for’ loops”, JavaScript devs ask “How do I remove a particular element from an array in JavaScript?”, and Go devs need to know “What is the best way to convert byte array to string?”.

Top 10 questions Q42018 for TensorFlow

People are really excited about machine learning, and TensorFlow is one of the top projects in the space. But most of the top questions show people struggling with issues like “TensorFlow not found using pipor the ability to work with their current CPU/GPU combos.

A worthy goal for the TensorFlow team: With this dashboard they’ll be able to track how these questions trend down through time — if they work out a foolproof way for devs to install TensorFlow with pip.

With the dashboard you can jump to see the top 10 questions at any given quarter. For example, let’s compare the top 10 for Kotlin in 2018Q4 vs 2017Q2:

Top 10 Kotling questions in 2017Q2 vs 2018 Q4

This looks good: Most top questions from 2017 disappeared into the void — and as a guess, let’s say that the team was great at solving common issues people had back then. And on the 2018Q4 side you can see how most current top questions are new, and not the same as many months ago.

This are the top 10 questions for Redis:

Top 10 questions for Redis

You can see that the current top question for Redis ask how to install it on Windows. If you are a Redis developer working to improve the Java libraries, this doesn’t tell you much — luckily our dashboard also allows us to filter for subtags.

Let’s look at the top 10 Redis+Java questions instead:

Top 10 questions for Redis + Java

If we filter for “redis” and “Java” then we see different questions: Most of these are related to Redis and Spring, for example. Which are different to the top question for Redis and C#:

Top 10 questions for Redis + Java

Oh yeah — the top question for Redis + Java asks for simple C# examples. Maybe developers can’t find these in the official docs?

With this dashboard you can also see how many answers each question has. For example, the top question “Convert a String to int in Rust?“ has 5 different answers. Go has 12 different replies to “What is the best way to convert byte array to string?” and 20 to “How to efficiently concatenate strings in Go?”:

Top questions for Rust and Go

Why stop at the top 10? You can also go deeper into the top 30:

Top 21 to 30 questions for jQuery
You can find all this data in BigQuery. Every 3 months Stack Overflow publishes a snapshot of their latest data, and we make sure to have a fresh copy ready to be queried.
SELECT (
SELECT tag
FROM UNNEST(tags)
ORDER BY view_count DESC LIMIT 1
) tag
, * EXCEPT(tags)
FROM (
SELECT quarter_views , view_count
, ROW_NUMBER() OVER(ORDER BY quarter_views DESC) q_ranking
, ROW_NUMBER() OVER(ORDER BY view_count DESC) ranking
, ARRAY(
SELECT AS STRUCT tag, b.view_count
FROM UNNEST(tags) tag
JOIN `fh-bigquery.stackoverflow_archive_questions.merged_aux_tags` b
ON tag=b.tag
) tags, title
FROM `fh-bigquery.stackoverflow_archive_questions.merged`
WHERE quarter='2018-12-01'
AND view_count > 50000
)
WHERE q_ranking<30 OR ranking <30
ORDER BY 1 DESC

For this I started by storing a copy of each snapshot through time — and then proceeded to calculate the difference of pageviews between the snapshots.

Luckily the query looks pretty simple:

CREATE OR REPLACE TABLE `stackoverflow_archive_questions.merged` 
AS
SELECT 
IFNULL(
view_count -
LAG(view_count) OVER(PARTITION BY id ORDER BY view_count)
, view_count) quarter_views, *
FROM (
SELECT PARSE_DATE('%Y%m',_table_suffix) quarter
, id, view_count
, SPLIT(tags, '|') tags
, score, creation_date, answer_count
, accepted_answer_id, title
FROM `fh-bigquery.stackoverflow_archive_questions.q*`
)
#standardSQL
SELECT title, quarter_views, view_count
FROM `fh-bigquery.stackoverflow_archive_questions.merged`
WHERE 'google-cloud-dataflow' IN UNNEST(tags)
AND quarter='2018-12-01'
ORDER BY quarter_views DESC
LIMIT 10
Top current questions for Dataflow
#standardSQL    
WITH top_questions AS (
SELECT id, title, quarter_views, view_count
FROM `fh-bigquery.stackoverflow_archive_questions.merged`
WHERE 'google-cloud-dataflow' IN UNNEST(tags)
AND quarter='2018-12-01'
), latest_answer AS (
SELECT parent_id, DATE(MAX(COALESCE(last_edit_date, last_activity_date, creation_date))) answer_last_edit_date
FROM `bigquery-public-data.stackoverflow.posts_answers` b
GROUP BY parent_id
)
SELECT SUBSTR(title, 0,80) title, quarter_views, view_count, answer_last_edit_date
FROM top_questions a
JOIN latest_answer b
ON a.id=b.parent_id
WHERE DATE_DIFF(CURRENT_DATE(), answer_last_edit_date, DAY)>360
ORDER BY quarter_views DESC
LIMIT 10
Top current questions for Dataflow, that haven’t been updated in a year

Now you can go back to the top of this post to play with the dashboard. Please share what you find!

Then also check out what tags are trending and which ones are going down, with The real Stack Overflow trends:

Predict how long it takes for Stack Overflow to answer a question:

Want more stories? Check my Medium, follow me on twitter, and subscribe to reddit.com/r/bigquery. And try BigQuery — every month you get a full terabyte of analysis for free.