From the unattributable, “winners write the histories,” to Napoleon Bonaparte’s, “History is a set of lies agreed upon,” folk wisdom is full of references to humanity’s tenuous relationship with objective reality. Some of the latest developments in the propaganda arms race, colloquially called “Deep Fakes,” are a frightening harbinger of the Disinformation Age that may be coming. Journalists, governments, and media organizations need to start developing their strategies for handling these fake videos today.
The existence of counterfeits is clearly not news; we’ve been paying Hollywood for realistic looking fiction for many years. The invention of photoshop brought with it fearful rhetoric similar to what you’ll find in the rest of this article, and for good reason. The arrival of high quality digital photo manipulation forced journalists, citizens, and intelligence agents to be more skeptical than ever before of photographic evidence. We need to continue to ramp our skepticism up in the face of these new fraud making technologies.
This article originally appeared on HFSResearch.com. Create a free HFS account to gain access to more insights, business and technology research, webinars, and more.
The latest developments in video manipulation, especially in the realm of falsifying “video portraits” (video of someone facing the camera from the shoulders up) are impressive, and troubling. For example, watch this presentation from the SIGGRAPH 2018 conference:
You may have seen a video like this already. Comedian and writer Jordan Peele created a similar video, depicting Barack Obama saying things that of course the former president would never say. Before that, Radiolab did an episode on these new advances and made their own fake as part of their reporting.
The choice of subject in these particular fakes begs hard questions. What would it mean for a democratic nation if any reasonably powerful entity could make realistic videos of the President of the United States saying whatever they choose? Alternatively, imagine a realistic video of Kim Jong Un declaring war on the United States. Imagine Donald Trump’s reaction to that video — would he remain appropriately skeptical and calm when presented with such footage?
The power to rewrite history, on demand, and erase all evidence of the previous version of history was the central task of 1984’s Ministry of Truth. Control of information is of critical importance in regimes like North Korea, but information is power everywhere. Russian botnets amplify disinformation stories written by the “Internet Research Agency”. Donald Trump declares anything that doesn’t suit his personal worldview is “fake news.” Pundits and politicians declare that “truth isn’t truth,” and present “alternative facts,” in place of any truths that inconvenience them.
Furthermore, making these videos is not terribly hard.. Jordan Peele and Radiolab are not “powerful” entities the same way that the NSA, KGB, CIA, or MI6 are powerful. Journalists and actors can already make realistic forgaries just to show that it can be done. In the future, the average white collar worker could easily afford to make similar fakes.
At a length requirement of only one minute worth of training data, a lot of publicly available video footage becomes weaponizable.
Between the primacy effect, viral proliferation of video, and a lack of widespread skepticism, these technologies have enormous potential to disrupt democratic systems. What’s more, the existence of truly photorealistic fakes gives our already disingenuous politicians carte blanche to deny the validity of real video footage. The Shaggy defense of, “it wasn’t me,” could become the political mantra of the 2020’s.
Mercifully, these technologies have limitations. For now the kind of video that can be realistically faked is fairly limited. For example, while there is ongoing research into falsifying other styles of video, the most realistic fakes are currently limited to video portraits. Another limitation stems from the computational requirements of these machine learning models.
In addition to the political motivations of choosing Obama or Trump as the subject of these forgeries, there is a practical one. Just like a human counterfeiter, the video forging models need to practice before they can produce convincing fakes. The current state of the art relies on a class of algorithm called a Generative Adversarial Network — a subset of the neural networks that are currently taking the AI world by storm.
These machine learning algorithms produce models that then produce the fake videos. Before a model can produce realistic videos it must be trained. Training these models involves several hours of intense processing, and requires high quality data upon which to train. The Deep Video Portraits paper — which complements the SIGGRAPH video above — indicates that, “Training our network takes 10 hours for a target video resolution of 256 × 256 pixels, and 42 hours for 512 × 512 pixels.”
42 hours of training on a high performance computer is a significant investment of time (and money, if you’re renting the hardware). To produce high definition videos these models will have to train much longer. This weakness is common among modern machine learning tactics.
In the field of forensics, there has always been some tension between the goal of scientific openness and ensuring that our techniques are not easily circumvented.
So common, in fact, that OpenAI published a blog post which suggested that the amount of computational resources dedicated to training machine learning models has been growing exponentially. Meanwhile, people continue to eulogize Moore’s Law and Central Processing Unit (CPU) clock speeds stagnate. While this is a barrier in the status quo, advances in alternative hardware such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) may very well fill the demand for increased computational power.
While training time remains a source of weakness in these types of systems, one of the major contributions from the Deep Video Portraits paper is in reducing the amount of training data required:
We construct the training corpus […] based on the tracked video frames of the target video sequence. Typically, two thousand video frames, i.e., about one minute of video footage, are sufficient to train our network.
At a length requirement of only one minute worth of training data, a lot of publicly available video footage — newscasts, vlogs, SnapChats, and more — becomes weaponizable.
As we discussed in our recent POV, advances in the use of synthetic data have significantly reduced the amount of real training data that needs to be collected. We expect advances in synthetic and adversarial data to continue to reduce the prohibitively large data requirements associated with modern machine learning tactics.
With an understanding that these technologies will advance, many researchers are interested in how to detect the counterfeits produced by the latest tactics. Unfortunately, fraud detection is a war of attrition.
For example, in June 2018 a paper titled “In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking” was published revealing that the (then) latest in counterfeit video could be detected by carefully watching the eyes. At the time the state of the art used still photos as training data, so they rarely generated videos where the subject blinked realistically. In August 2018, the Deep Video Portraits paper was published — that system has a module just for capturing and falsifying realistic blinking.
A research group within DARPA called Media Forensics has been devoting a lot of attention to the detection of fraudulent media. Hany Farid, A member of Media Forensics, and professor of computer science at Dartmouth University published an article titled “Digital Forensics in A Post-truth Age,” (published May 2018). In it he also noted the unique challenges posed by the latest tactics, especially those that use Generative Adversarial Networks (e.g. the Deep Video Portraits paper):
All indications are that fake news is a serious threat to our society and democracy. We in the digital forensic community must continue to develop and refine techniques that will allow individuals, media outlets, and governments to quickly and accurately authenticate digital videos, images, and audios. This task has recently been made even more difficult by rapid advances in machine learning that have made it easier than ever to create sophisticated and compelling fakes. These technologies have removed many of the time and skill barriers previously required to create high-quality fakes. Not only can these automatic tools be used to create compelling fakes, they can be turned against our forensic techniques in the form of generative adversarial networks (GANs) that modify fake content to bypass forensic detection.
It’s good to know that deeply skilled people such as Farid are working on this problem. It’s also a little disheartening to hear experts lament the difficulty of detecting the latest frauds. Farid lays out several barriers that the academic community faces in the battle against digital frauds. Funding is, of course, a pressing concern for all types of research but one of the most interesting struggles Farid mentions is the tenuous balance of academic openness and the escalatory nature of fraud creation and detection:
In the field of forensics, there has always been some tension between the goal of scientific openness and ensuring that our techniques are not easily circumvented. […]. Without necessarily advocating this as a solution for everyone, […], I have held back publication of new techniques for a year or so. This approach allows me to always have a few analyses that our adversaries are not aware of.
Clearly, it’s difficult to know what the true state of the art is on either side of this battle. The fraudsters don’t want to show their latest wares until the proverbial, “moment of (fake) truth,” and the counter-fraudsters don’t want to advertise their detection capabilities to the fraudsters for similar reasons.
The field of digital forensics will likely see increased investment over the next few years. But because of the culture of secrecy, and the arms-race ethos of the field, it would be unwise to rely entirely on digital forensics in the fight against frauds.
Information forgery is not a uniquely modern problem. In Medieval Europe sealing wax and precious house seals were used to verify the authenticity of letters and other missives. The Heirloom Seal of the Realm served a similar purpose in ancient China. Stealing such seals, or creating counterfeit seals, was a path to sending fake messages that appear to come from the king, emperor, or lord to whom the true seal belonged. Such seals were incredibly valuable, and kept under lock and key.
The sealing wax concept moves the goalposts — instead of trying to show that a letter was fraudulent by examining the handwriting, it establishes a chain of custody. Digital tactics, similar in concept to sealing wax, have been used in computer networking for a long time; digital signatures, public key encryption, SSL certificates, and protocols like DNSSec all come at the problem of authenticity from this angle. Instead of verifying that the data in question is “real” video, we can verify that the data originated from a reliable source.
Mountains of work has already been done by software security experts to create systems of trust. Transport Layer Security (TLS), the protocol that powers secure HTTPS connections, is a prolific examples. In TLS trusted parties called Certificate Authorities evaluate and administer “certificates” which are used to verify that the website you’re viewing was served to you by the owner of the URL you typed in. These certificates rely heavily on digital fingerprinting, and public key encryption to verify the authenticity of the data. Another protocol, DNSSec, uses public key encryption to establish the authenticity of DNS records.
Companies like Keybase are trying to increase adoption of public key encryption by making it easier. Keybase helps users integrate encryption across multiple devices. The service also uses email addresses and social media accounts to help verify the identity associated with a public encryption key.
There is a growing interest in decentralized computing. Identity and source authentication have been a huge aspect of this growth. From InterPlanetary File System (IPFS) to cryptocurrency, the nature of decentralized systems requires them to use digital fingerprinting extensively. As we wrote in January, blockchain technology has huge potential for creating consensus driven data integrity through the use of hashing.
There are already examples of blockchain technology being used to tackle the same issues that DNSSec attempts to solve. Handshake, Blockchain DNS, and Namecoin are all examples of this tactic. Perhaps media companies like The New York Times will announce their own blockchains, or start signing all of their articles using public key encryption.
The bottom line: governments, media organizations, and any entity
in the public sphere need to start signing and watermarking the digital information they create. The creation of counterfeit and fraudulent information is powerful, so we can be sure that powerful organizations will continue to explore this technology. Don’t get caught off guard, start building a chain of custody strategy today.