Colons in computer-science paper titles


McGill University's txtLAB blog posted an article earlier this week plotting the proportion of academic papers in the field of literary studies that have a colon in the paper title, from 1950 to 2010. Colon-titles are a thing academics sometimes like to do, with patterns such as "Main Idea: Additional Explanatory Text" and "Joke Title: Real Topic Explained After the Colon".

In the txtLAB analysis, colons start out being used in about 15% of literary-studies article titles in the early 1950s, a proportion that thereafter grows nearly linearly: 30% by the late 1960s, 40% by 1975, 50% by 1985, then leveling out from the mid-1990s to settle at the current proportion, which is around 60%. Check out their post for the plot and discussion.

The computer-science case

Do similar patterns hold in computer science? I plotted the same thing, but with the papers indexed in DBLP. As you can see below, things look a bit different. The trend in computer science has gone through more ups and downs, and the absolute proportion of articles using colons in their titles has been far lower throughout. It's currently (2018) at the highest it's ever been, but that's still only a hair over 20%.

The plot starts in 1953, which is the first year in which DBLP indexes more than 100 articles, and goes up through the partial-year 2018 data, as of the September 15 data dump. I've included only papers of types "article" or "inproceedings" (excluding theses, preprints, books, etc.). The dotted curve and shaded error band are from ggplot2's version of loess smoothing, weighted by the number of articles indexed each year (tweaked just to give a visual summary; don't take this too seriously as a regression curve).

Observations

There's an early hump in the 1960s, particularly extreme in 1961-63. Browsing through colon-titles from those years, the reason is almost entirely one specific type of paper: Communications of the ACM began publishing a large number of algorithms, which all have titles of the form "Algorithm 64: QuickSort". Since the computer-science literature wasn't that large at the time (626 total papers indexed in 1961), those algorithm articles make a real dent.

Things settle down from the early 1970s as the literature grows much larger. From the 1970s through the 1990s the proportion of colon-title articles increases steadily but modestly, hitting 10% in the late 1970s, and 15% in the mid 1990s. It then plateaus for a while, hovering around 15-16% with small fluctuations for a little over a decade. There even seems to be a slight downward trend during much of the 2000s. Colon-titles are again on the upswing since the late 2000s, though, this time at a seemingly accelerating pace. The current level is just above 20%, an all-time high.

Of course, take all this with the usual caveats about corpus construction. As far as I know, the composition of what exactly DBLP indexes over the years hasn't been analyzed in much depth. The trend over the past 10 years does look tantalizingly smooth though.