A catalogue of software constructs, languages, or APIs which are unexpectedly Turing-complete; implications for security and reliability (computer science)
created: 9 Dec 2012; modified: 19 Oct 2018; status: finished; confidence: highly likely; importance: 6
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.–Greenspun’s Tenth Law
Turing-completeness (TC) is the property of a system being able to, under some simple representation of input & output, compute any program.
TC, besides being foundational to computer science and understanding many key issues like why a perfect antivirus program is impossible
, is also weirdly common: one might think that such universality as a system being smart enough to be able to run any program might be difficult or hard to achieve, but it turns out to be the opposite and it is difficult to write a useful system which does not immediately tip over into TC. It turns out that given even a little control over input into something which transforms input to output, one can typically leverage that control into full-blown TC. This can be amusing, useful (although usually not), harmful, or extremely insecure & a cracker’s delight (see language-theoretic security
, based on exploiting weird machines
1). Surprising
examples of this behavior remind us that TC lurks everywhere, and security is extremely difficult.
Too powerful
languages can also manifest as nasty DoS attacks; the fuzz tester afl found in OpenBSD’s roff that it could create an infinite loop by abusing some of the string substitution rules.
They are probably best considered as a subset of discovered
or found
esoteric programming languages (esolangs). So FRACTRAN, as extraordinarily minimalist as it is, does not count2; nor would a deliberately obfuscated language like Malbolge (where it took years to write a trivial program) count because it was designed to be an esolang; but neither would Conway’s Game of Life count because questions about whether it was TC appeared almost immediately upon publication and so it turning out to be TC is not surprising, and given the complexity of packet-switching networks & routers it’s not necessarily too surprising if one can build a cellular automaton into them or encode logical circuits, or if airplane ticket planning/validation is not just NP-hard or EXPSPACE-hard but undecidable (because of the complex rules airlines require). Many configuration or special-purpose languages or tools or complicated games turn out to violate the Rule of least power & be accidentally Turing-complete
, like MediaWiki templates, sed
or repeated regexp/find-replace commands in an editor (any form of string substitution or templating or compile-time computation is highly likely to be TC on its own or when iterated since they often turn out to support a lambda calculus or a term-rewriting language or tag system eg esolangs ///
or Thue ), XSLT, Infinite Minesweeper, Dwarf Fortress3, Starcraft, Minecraft, Ant, Transport Tycoon, C++ templates & Java generics, DNA computing etc are TC but these are not surprising either: many games support scripting (ie TC-ness) to make their development easier and enable fan modifications, so games’ TC may be as simple as including syntax for calling out to a better-known language like Perl, or it may just be an obscure part of a standard format (most people these days are probably unaware that TrueType & many fonts are PostScript programs based on stack machines, similar to DWARF debugging and ELF metadata, or that some music formats go beyond MIDI in providing scripting capabilities and must be interpreted to be displayed; once one knows this, then fonts being TC are no more surprising than TeX documents being TC, leading of course, to many severe & fascinating font or media security vulnerabilities such as the BLEND vulnerability or SNES & NES code exploiting Linux systems Other formats, like PDF, are simply appalling.4). Similarly, such feats as creating a small Turing machine using Legos or dominos5 would not count, since we already know that mechanical computers work.
On the other hand, the vein of computer security research called weird machines
is a fertile ground of that’s TC?
reactions. What is surprising
may differ from person to person.
X86 shenanigans:
return-into-libc attacks: software libraries provide pre-packaged functions, each of which is intended to do one useful thing; a fully TC
languagecan be cobbled out of just calls to these functions and nothing else, which enables evasion of security mechanisms since the attacker is not running any recognizable code of his own. See, among many others,
The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)&
On the Expressiveness of Return-into-libc Attacks.
Pokemon Yellow: Pokemon Yellow Total Control Hack
outlines an exploit of a memory corruption attack which allows one to write arbitrary Game Boy assembler programs by repeated in-game walking and item purchasing. (There are similar feats which have been developed by speedrun aficionados, but I tend to ignore most of them as they are impure
: for example, one can turn the SNES Super Mario World into an arbitrary game like Snake or Pong but you need the new programs loaded up into extra hardware, so in my opinion, it’s not really showing SMW to be unexpectedly TC and is different from the other examples. Similarly, one can go from Super Game Boy to SNES to arbitrary code like IRC. This distinction is debatable.)
one category of weird machines doesn’t quite count since they require an assumption along the lines of the user mechanically clicking or making the only possible choice in order to drive the system into its next step; while the user provides no logical or computational power in the process, they aren’t as satisfying examples for this reason:
Possibly accidentally Turing-complete systems:
Some people seem to get caught up in discussions about weird machines or how big
an AI agent must be and whether there will be one, two, 10, or millions; this is not an important issue as it is merely an internal organizational one. What is important are the inputs and outputs: how capable is the system as a whole and what resources does it require? No one cares if Google is implemented using 50 supercomputers, 50,000 mainframes, 5 million servers, or 50 million embedded/mobile processors, or a mix of any of the above exploiting a wide variety of chips from custom tensor processing units
to custom on-die silicon (implemented by Intel on Xeon chips for a number of its biggest customers) to FPGAs to GPUs to CPUs to still more exotic hardware like prototype D-Wave quantum computers - as long as it is competitive with other tech corporations and can deliver its services at a reasonable cost. (Indeed, a supercomputer
these days mostly looks like a large number of rack-mounted servers with unusual numbers of GPUs & connected by unusually high-speed InfiniBand connections and is not as different from a datacenter as one might think.) Any of these pieces of hardware could support multiple weird machines depending on their internal dynamics & connectivity. Similarly, any AI system might be implemented as a single giant neural network, or as a sharded NN running asynchronously, or as a heterogeneous set of micro-services, or as a society of mind
etc - but it doesn’t especially matter, from a complexity or risk perspective, how exactly it’s organized internally as long as it works. The system can be seen on many levels, each equally invalid but useful for different purposes.
Here is an example of the ill-defined nature of the question: on your desk or in your pocket, how many computers do you currently have? How many computers are in your computer
? Did you think just one? Let’s take a closer look.
It goes far beyond just the CPU, for a variety of reasons: transistors and processor cores are so cheap now that it often makes sense to use a separate core for realtime or higher performance, for security guarantees, to avoid having to burden the main OS with a task, for compatibility with an older architecture or existing software package, because a DSP or core can be programmed faster than a more specialized ASIC can be created, or because it was just the simplest possible solution. Further, many of these components can be used as computational elements even if they were not intended to be or generally hide that functionality.
Thus:
A common Intel CPU has billions of transistors, devoted to a large number of tasks:
motherboard BIOS and/or management chips with network access
Mark Ermolov notes that
It’s amazing how many heterogeneous CPU cores were integrated in Intel Silvermont’s Moorefield SoC (ANN): x86, ARC, LMT, 8051, Audio DSP, each running own firmware and supporting JTAG interface
accidentallyleft enabled on shipping devices, like the Via C3 CPUs’s embedded ARM CPUs
a modern smartphone will contain somewhere between 8 and 14 ARM processors, one of which will be the application processor (running Android or iOS or whatever), while another will be the processor for the baseband stack..)
…
So a desktop or smartphone can reasonably be expected to have anywhere from 15 to several thousand computers
in the sense of a Turing-complete device which can be programmed and which is computationally powerful enough to run many programs from throughout computing history and which can be exploited by an adversary for surveillance, exfiltration, or attacks against the rest of the system.
None of this is unusual historically, as even the earliest mainframes tended to be multiple computers, with the main computer doing batch processing while additional smaller computers took care of high-speed I/O operations that would otherwise choke the main computers with interrupts.
In practice, aside from the computer security community (as all these computers are insecure and thus useful hidey-holes for the NSA & VXers), users don’t care that our computers, under the hood, are insanely complex and more accurately seen as a motley menagerie of hundreds of computers awkwardly yoked together (was it the network is the computer
or the computer is the network
…?); he perceives and uses it as a single computer.