IntroductionComputers suffer from errors that manifest as memory corruption of one or more bits. The causes of these errors range from manufacturing defects to environmental factors such as cosmic rays and overheating. While the probability of a single error is minuscule, the amount of Internet-connected hardware is extremely large: there were approximately 5 billion devices connected to the Internet in 2010. The best way to conceptualize a small probability distributed over many rounds is to think of a lottery. The odds of winning the jackpot are infinitesimally small, but given enough players someone will win.
Researchers have exploited bit-errors before in amazing ways. But bit-errors can be detected and exploited in new ways on an Internet-wide scale. One of these means is by bitsquatting, or registering domain names one bit different than frequently resolved domains.
Theory of OperationWhen bit-errors occur they can change memory content. Computer memory content has semantic meaning. Sometimes, that meaning will be a domain name. And applications utilizing that memory will use the wrong domain name.
An example can illustrate this more clearly. The following is a binary representation of cnn.com:
- by the TCP/IP stack from kernel to user mode [varies by implementation]
- by your browser when it parses HTML
- ... and when it creates an internal representation of the DOM tree
- ... and when it creates a new HTTP request
- by your OS APIs during domain resolution
Lets further suppose one of these copy operations writes to the faulty memory module. The binary representation changes by one bit. It now represents con.com.
Upon clicking the link your browser will navigate to con.com instead of cnn.com.
ExperimentThe concept behind the experiment is simple: if bit-errors are indeed mutating domain names in device memory, then these devices must resolve and connect to these bitsquat domains. Therefore bitsquats of frequently resolved domains should be visited by devices around the world.
Execution of the experiment is not so simple. First is the problem of choosing domains to bitsquat. There is a difference between popular websites and frequently resolved domains. There are many frequently resolved domains that few people type or know. These domains belong to the content delivery and advertising networks of the Internet; domains such as fbcdn.net, 2mdn.net, and akamai.com. Content delivery and ad domains also make the best experimental targets as these domains are extremely unlikely to be typed by people. Second, every DNS query must be answered with two answers: one for the original domain and one for the bitsquat domain. This is because the original requestor may be requesting a response for the original name, and will discard responses for invalid domains. For more on this, please read the whitepaper or look at the slides.
For my experiment I registered the following domains. Note: the registration for these has since expired and they are no longer owned by me.
|Bitsquat Domain||Original Domain|
I used a Python script to answer DNS queries and Apache to log incoming HTTP requests and waited for connections. And to my surprise, devices connected.
Experimental FindingsThe following findings are based on my Apache logs from September 26, 2010 to May 5, 2011. Log entries due to search engine crawlers and web-app vulnerability scans were manually filtered. As the process was manual, some crawler/scanner requests may still be counted in these statistics.
Finding 1: Bit-errors can be exploited via DNSDuring the logging period there were a total of 52,317 bitsquat requests from 12,949 unique IP addresses. When not counting 3 events that caused extraordinary amounts of traffic, an average of 59 unique IPs per day made HTTP requests to my 32 bitsquat domains. These requests were not typos or other manually entered URLs, and some show signs of several bit errors. Here are some actual examples (with personal data removed):
static.ak.fjcdn.net 109.242.50.xxx "GET /rsrc.php/z67NS/hash/4ys0envq.js HTTP/1.1" "http://www.facebook.com/profile.php?id=xxxxxxxxxx" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; Trident/4.0; GTB6.5; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2; Hotbar 18.104.22.168; OfficeLiveConnector.1.5; OfficeLivePatch.1.3; AskTbZTV/22.214.171.12404)"
msgr.dlservice.mic2osoft.com 213.178.224.xxx "GET /download/A/6/1/A616CCD4-B0CA-4A3D-B975-3EDB38081B38/ar/wlsetup-cvr.exe HTTP/1.1" 404 268 "Microsoft BITS/6.6"
s0.2ldn.net 66.82.9.xxx "GET /879366/flashwrite_1_2.js HTTP/1.1" "http://webmail.satx.rr.com/_uac/adpage.html" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; HPNTDF; AskTB5.2)"
mmv.admob.com 109.175.185.xxx "GET /firstname.lastname@example.org HTTP/1.1" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; HW iPhone2,1; en_gb) AppleWebKit/525.18.1 (KHTML, like Gecko) (AdMob-iSDK-20101108; iphoneos4.2)"
Finding 2: Not all bit-errors are created equalSome machines control considerably more traffic than others. While a bit-error in the memory of a PC or phone will only affect one user, a bit-error in a proxy, recursive DNS server, or a database cache may affect thousands of users. Bit-errors in web application caches, DNS resolvers, and a proxy server were all observed in my experiment. For instance, a bit error changing fbcdn.net to fbbdn.net led to more than a thousand Farmville players to make requests to my server.
Finding 3: Mobile and embedded devices may be more affected than traditional hardwareThe following graphic shows a comparison of HTTP User-Agents from visitors to Wikipedia during March of 2011 to User-Agents visiting my bitsquat domains. The "other" column, which includes various phones, game consoles, and other embedded devices was considerably more prevalent in the bitsquat visitors. Curiously the are considerably fewer MacOS User-Agents visiting bitsquat domains than there were visiting Wikipedia. I do not have an explanation as to why this is so.
Finding 4: Bitsquat traffic represents a slice of normal trafficThe visitors to my bitsquat domains came from all over the world and included every major operating system and embedded platform. While there were considerable differences in the percentage of visitors using MacOS and mobile platforms, the percentage of visitors using Windows, Linux, Android and iPhones was approximately the same as that of Wikipedia visitors. Additionally for the visitors determined to be in United States via a geoip database, a diurnal pattern corresponding to computer use can be observed.
Finding 5: HTTPS/TLS will not help. DNSSEC will help a tiny bit.HTTP 1.1 includes a header field called Host. This field is populated with the domain the client thinks it connected to. If the Host header contains the bitsquat domain, then a bit error occurred before domain resolution. If the Host header contains the original domain, the error occurred during domain resolution. In 96% of the cases, the bit-error had occurred prior to DNS resolution.
Transport security technologies such as SSL and TLS are designed to protect the confidentiality, authenticity and integrity of data moving between two nodes. Bit-errors most frequently happen to data when it is "at rest" on one of the nodes. DNSSEC will only resolve the 4% of the cases where a bit error occurred in the resolution process.
The DataThe PCAPs of all DNS traffic are available here: (dnslogs.tar.7z, 56Mb, 7zip compressed)
The HTTP log entries may contain personal information of random people and hence will not be publicly released. If you have legitimate research interest in the HTTP logs, please contact me.
Further researchDuane Wessels of Verisign looked for evidence of network level bit-errors in DNS queries as seen at domain roots. His findings indicate "that bit-level errors in the network are relatively rare and occur at an expected rate." (emphasis mine). The goal of his research was to determine if the 4% of requests with a non-bitsquat Host header were due to corruption of UDP packets after transmission. The final determination was that the packets were very unlikely to be corrupted during transmission on the network. In his own words: "We believe that UDP checksums are effective at preventing 'bitsquat' attacks and other types of errors that occur after a DNS query leaves a DNS resolver and enters the network. Bitsquat errors that occur prior to entering the network, however, will not benefit from UDP checksums since the sender calculates its checksum over the erroneous data."
I fully encourage anyone reading this to replicate my experiments and share their findings. If you would like more information, please feel free to contact me.