You (probably) don’t need ReCAPTCHA


Google’s ReCAPTCHA is often the first tool that many webmasters reach for when confronted with the need to stop spam and automated malicious traffic from harming their services. In this post I explain several reasons why ReCAPTCHA is usually not the best solution to use for this purpose, as it is often unnecessary, inconveniences users, and subjects users to intensive tracking and fingerprinting that they are not able to opt-out of. Several alternative solutions to ReCAPTCHA for various threat models are presented as well as best practices for implementing captchas in general.

The face of evil

ReCAPTCHA is harmful

ReCAPTCHA is yet another free-of-charge product offered benevolently by Google for any webmaster to implement within their own services. How does ReCAPTCHA differentiate legitimate human users from bots? ReCAPTCHA relies extensively on user fingerprinting, putting emphasis on the question of “Which human is this user?” rather than the ordinary “Is this user human?”. It’s worth noting how much easier it is to successfully solve ReCAPTCHAs when the user is logged into their Google account, thus allowing Google to associate their actions with their real identity. A similar effect is often reported for users of non-Google browsers, who notice ReCAPTCHAs take more time to complete in Firefox over Chrome. This is in-line with many other anti-competitive techniques that Google has used over the years to help grow their market share.

Although determining exactly how ReCAPTCHA works is very difficult, with Google not only heavily obfuscating its JavaScript, but also implementing an entire VM in JavaScript with its own bytecode language, there have still been many attempts to reverse-engineer some of the client-side code, as well as to theorize about how the server-side logic operates. Initial attempts at reverse-engineering ReCAPTCHA show copious amounts of information belong collected, including but not limited to: plugins, user agent, IP address, screen resolution, execution times, timezone, language, click/keyboard/touch information within the frame of the captcha, test results of many browser-specific functions and CSS evaluation, information about canvas element rendering, and cookies, including those affiliated with your Google account that were placed within the last 6 months.

There is a good reason why ReCAPTCHA uses the google.com domain instead of one specific to ReCAPTCHA. This allows Google to receive any cookies that they have already set for you, effectively bypassing restrictions on setting third party cookies and allowing traffic correlation with all of Google’s other services, which most users use. ReCAPTCHA collects enough information that it could reliably de-anonymize many users that simply wish to prove that they are Not A Robot. As JavaScript is now required to even view a ReCAPTCHA, even a user running software such as TBB (Tor Browser Bundle) may find themselves giving away more information than they intend to, for example if they have resized their browser window (which is discouraged for exactly this reason).

Correspondingly, webmasters that use Google’s ReCAPTCHA on their websites must link to both Google’s Privacy and Terms pages (included in the form by default in a small 8px style that makes them appear unclickable). Although Google used to have its own privacy and terms pages for ReCAPTCHA, these links are no longer specific to ReCAPTCHA, but rather are the privacy and terms pages for all users of Google services in general, regardless of which service is being used, or if the user has (or even wants) a Google account to begin with. Therefore accepting these terms (implicitly, by attempting to prove you are Not A Robot) grants Google permission to do everything that they do to their regular users of their services to you, and little information is available as to what specifically is done (GDPR is likely to be unhelpful here, given ReCAPTCHA’s spam-stopping purpose). Not only are the unhelpful links in the ReCAPTCHA box never opened by users, but there is also no Google logo or visual reference to indicate that ReCAPTCHA is a Google service, so many users have zero indication that they have just consented to all of Google’s tracking just because they tried to leave feedback or create a ticket on your website. If you thought you could use the Internet without using Google’s services, try using the Internet without filling out a single ReCAPTCHA, which for some users is required to pay their bills, file their taxes, and sometimes even use Government websites (if you somehow manage this, next try never sending email to Gmail/Gsuite addresses or using Google APIs for a more exciting challenge). Good luck.

It is worth mentioning that caring about user privacy to this extent is likely to be outside of the scope of concern for most websites. Many websites are already so tightly coupled to Google’s services (commonly including Google analytics, Google ads, Google APIs, Google tag manager, Google static resources, Google OAuth, Google Computer Engine, and many others) that the addition of a Google captcha appears negligible. With that said, different websites have different values and different users, and many do not want to require users to agree to Google’s tracking and labor for basic usage. The level of centralization that ReCAPTCHA forces is not good for anyone except Google.

Apart from the privacy implications of ReCAPTCHA usage, the actual captcha is very tedious for many classes of users, sometimes becoming so difficult that users find themselves unable to to complete the captcha at all. Users hate ReCAPTCHA. They really hate ReCAPTCHA. ReCAPTCHA is so hated that some websites have a profit model of charging users $20 annually to disable ReCAPTCHA, which thousands of users pay for. If this sounds like a great new business model to you and now you want to add ReCAPTCHAs to every page of your website to attempt to maximize profit, I will find you. And I will force you to complete a ReCAPTCHA every time you want food or water until you die from malnutrition after the first week. I have read countless posts from users that became so frustrated with a service that used excessive ReCAPTCHAs that they swore to never use the offending website again. These are often intelligent users with no disabilities who are simply tired of being treated like dirt and wasting their time. Be kind to your users and help minimize the amount of ReCAPTCHAS that they have to solve just to be allowed to use the Internet.

ReCAPTCHAs become significantly more difficult if the user attempts to ‘opt-out’ of Google’s services and tracking by using software that hinders it, such as VPNs, TBB, and many anti-tracking browser addons and modifications. To demonstrate what is meant by ‘very tedious’, below is a real-time recording of myself solving a single ReCAPTCHA using TBB:

Spambots are known to give up when forced to be patient

I got lucky and only needed to complete two challenges. Sometimes there are ten or more. Watching the above video, you might think to yourself “I knew the tor network was slow, but I didn’t know it was that slow!”. You would be correct to take note of this discrepancy. If we open up the web developer console, we can see that the HTTP requests for new captcha images only take a few hundred milliseconds. Despite this, Google’s heavily-obfuscated JavaScript intentionally delays the appearance of the new images by several seconds every time, which I’m sure has something to do with the fact that bots give up when forced to wait, probably. This is not a nice way to treat users that don’t want to perform unpaid labor and be fingerprinted by Google. Keep in mind that the above video demonstrates one of the worst possible cases of ReCAPTCHA UX (which some userscripts may improve), and that the average user has a significantly quicker experience, providing that they are not attempting to thwart any of Google’s tracking and don’t make many mistakes.

In addition to this tediousness, the actual labor that the user is performing is directly used to benefit Google. Worry not however, as Google is eager to brag about the selfless humanitarianism that you’re engaging in by choosing ReCAPTCHA, stating the following on their main ReCAPTCHA page:

“Hundreds of millions of captchas are solved by people every day. ReCAPTCHA makes positive use of this human effort by channeling the time spent solving captchas into digitizing text, annotating images, building machine learning datasets.”

This is certainly a very rosy way of convincing you to feel good about forcing your users to engage in unpaid labor that directly benefits the world’s most powerful surveillance corporation. ReCAPTCHA is free for a reason.

Lastly, ReCAPTCHA is popular. Very popular. While this brings some advantages, it also means that there’s significant efforts to break ReCAPTCHA, and those efforts all potentially affect your website, with your captcha implementation being perfectly identical to a million others. As a result of this, there have been many papers published that break ReCAPTCHA over the years, generally with Google making modifications to improve their captcha afterwards. There have also been paid-for services that use human labor to solve captchas on behalf of a paying client for less than a cent each. For a modern and user-friendly example of bypassing ReCAPTCHA, see Buster. Buster is a modern browser extension (Firefox+Chrome+Opera) which helps you to solve difficult captchas by completing reCAPTCHA audio challenges for you by using speech recognition.


Captchas are not always necessary

Before implementing a captcha, it’s worth considering if one is necessary to begin with. To help with evaluating this proposition, consider if your threat model is concerned over customized or uncustomized spam. Uncustomized spam is pervasive across many Internet protocols, and you will encounter it quickly after enabling HTTP, SSH, or many other protocols on a server. It is generally unintelligent, cheap to execute, and easy to block, even without captchas. Customized spam, however, is spam that has been written to specifically affect a given company, service, website, or user. As customized spam is created by an actor that is able to tailor it to your service, it is more dangerous than uncustomized spam, and more effort is required to effectively limit it.

Many developers vastly over-estimate the likelihood of customized spam. As a competent programmer, it is easy to imagine how effortlessly someone could decimate your service with spam if they were sufficiently dedicated. One could imagine a malicious actor writing a simple script that could spam or DoS your website by just using Curl and bash. Even if you have a captcha, you can imagine them using OCR or machine learning to automatically bypass it, then using proxies and VPNs to automatically bypass your IP rate-limiting. While in this imaginative trance, you’ve forgotten that 99% of users have no clue how to do any of this, and do not even know what Curl or HTTP are. Furthermore, your service likely offers very little prospective rewards to would-be competent attackers.

Just because someone could spend hours (or minutes) writing a program to spam your website does not mean that someone will. Your personal blog about the latest vegan bacon is not a high-priority target for anyone. Adding a ReCAPTCHA to your Contact Me page is just a great way to get no one to talk to you. I’ve ran several websites with millions of pageviews that have received zero customized abuse and have spoken to other webmasters with similar experiences. Jeff Atwood of codinghorror.com once wrote similarly:

The comment form of my blog is protected by what I refer to as “naive captcha”, where the captcha term is the same every single time. This has to be the most ineffective captcha of all time, and yet it stops 99.9% of comment spam.

This is not a suggestion to do nothing, ignore basic security, and be unprepared for attacks, but rather to realistically consider your threat model and apply only what is necessary.


ReCAPTCHA alternatives for uncustomized spam

For uncustomized spam, a full captcha implementation is rarely necessary. This section lists some simple and effective tricks that stop most uncustomized spam from impacting your website.

Hidden form elements

Uncustomized spam is not intelligent enough to know when it should or should not fill out a form element. For example, adding a form element with a name of ‘url’ and hiding it with CSS allows you to reject any request that is made with it filled, which spambots are eager to do. To maintain accessibility be sure to add a label to this element so that users who use screen readers do not fill it out. Other good hidden form element names include ‘website’, ‘firstname’, ‘lastname’, ‘email’, and ‘name’, unless they are already being used legitimately.

Static questions

Uncustomized spambots are also so unintelligent that they do not correctly answer simple questions such as “What is 2+3?”, or “what color is this website?”. These questions effectively stop almost all uncustomized spam. Common software stacks such as WordPress and Drupal have free plugins that will allow you to quickly create questions like these.

Community-specific questions

If your website is community-centric such as a forum or blog, you can ask a community-specific question that prospective members of your community should know the answer to. This is a simple and great way to prevent users from joining your community that you believe shouldn’t be participating, either because they lack basic relevant knowledge, or because they are unable or unwilling to learn it. As an example, a community of mathematicians might ask the user to name a simple formula or solve an equation, given an image of it.

Effective at keeping out the arithmophobic

For another example, a community of niche media connoisseurs might ask the user to identify a certain character that they deem to be important to their shared culture.