Google may rank sites for queries that don't appear on the page at all

By Walid Halabi

When Google first appeared, putting a keyword into Google Search returned pages that you could be certain contained the word exactly as is. It was a straight-forward, predictable way to search the web, but it meant that people who were less experienced with Google may struggle to get their queries just right.

Next, Google put in some basic smarts which made searching easier. First, it incorporated stemming (e.g. searching for “walking” might return results that include “walk”), then it began to include related words that were more loosely related to your query, for example a search for “hat” can return results that include the word “cap”.

This was still in useful territory, arguably, because cap and hat are synonyms to some extent. The semantics may differ slightly but they’re closely related enough that returning results for both is probably more useful than not doing so. And putting “hat” in quote marks will exclude results for cap, so power users can tell Google to be more precise.

Google's unexpected search results

That’s… not…. right……

We thought we understood how Google’s reading between the lines worked, until recently we noticed that an article published on this site about how Uber’s code texts don’t keep your account safe was being returned for the queries 2109085405 and 4843218317. These numbers never appeared in the article. Not even close.

We were a bit flummoxed. At first we thought they might be IDs of some sort that appeared in the code. Nope, not in the code at all. Wait - the article showed screen shots of numbers in text messages and phone numbers. Maybe Google was being really clever and extracting the text from the images. Nope, they’re not in the images at all.

So we did what everyone does when confused. We Googled. When searching for the numbers as is, the first result is a “who called me” site where people post about phone numbers that call or text them, usually to report that an unknown number was a spammer or a scam. The numbers we were ranking for were being reported as sending people Uber code texts, the same kind reported in the article.

What’s happening here?

We think it’s one of two things. Maybe the phone numbers in the search queries are being related to the concepts which appear in the article, namely the concept of Uber code spam texts. Similar to how synonyms are related and each word returns results for the other. And yet, if you search for Uber code text - a query the article ranks for - you don’t get those “who called me” site results. So it’s like the phone number is a synonym for the text query, but not the other way around.

On perhaps Google is linking pages based on how searchers move around the web. For example, a searcher might first search for the phone numbers and not find what they’re looking for, so they search for something like the article title, and end up on the article. Google might see a high enough proportion of users behaving this way, and decide to save them the trouble of performing the second search.

One more curious thing about all this is the fact that when searching for some of the phone numbers, our article is the only one to appear amongst dozens of results which are specifically about the phone numbers. There are other pages about the same topic, including on big properties like Reddit and Quora, so we’re pretty confused about this. Although searches for some phone numbers yield other articles.

Machine learning?

For now we’ll wave our hands in the general direction of RankBrain, Google’s machine learning-based search technology, and the unexpected quirks that commonly result from ML. But if the cause is something sitting in a neural network somewhere, it might mean that Google’s incorporating data into their models about the paths users take across the web, across multiple searches and web sites, and where those users end up satisfying their need for information.

Either way, that’s another ranking factor to take into account.