Edward Chang keeps a cybernetic implant at his desk, which seems almost calculatedly cool. Chang is a lean, low-voiced neurosurgeon at UC San Francisco. The cybernetic implant—more properly a Brain-Computer Interface—is a floppy, translucent plastic square about the size of my hand, embedded with a 16-by-16 array of titanium dots, each about the size of a cupcake sprinkle. This part sits on top of a brain. Half a dozen wires, white as iPhone cables, run from the square and terminate in copper leads. This is the interface part, the part that plugs into a computer.
Through some clever processing, Chang has used the output from these BCIs to do something remarkable. If a person has one in their head, while they talk, Chang’s team can take readings from their motor cortex—recording the activity corresponding to speech or, more specifically, the movement of the mouth, the tongue, and the jaw. Then some software can turn that brain activity into digitally synthesized, accurate, comprehensible speech—no human talking required.
Chang clicks over to a picture on his computer. It’s a person in a hospital bed, head encased in bandages—with a cable snaking out from underneath. “This person is speaking into a microphone, and we’re recording that brain activity in real time,” he says. “Our job has been to understand how that electrical activity, that code of information that’s transmitted by electrical signals in the brain, actually gives rise to speech.” For going on a decade, researchers around the world have been working on this problem—trying to understand the brain’s native tongue, so to speak, and restore a voice to people with paralysis or illness, people who can imagine themselves speaking but can’t actually do it. And as a paper by Chang’s group in the journal Nature this week shows, they’re getting close.
In a way, virtual mind-reading is just a happy side effect. Chang’s specialty is treating seizures; the BCI is a kind of targeting system. If someone has intractable, frequent ones, Chang’s team opens up their skull and puts the array onto their brain to find the source of the seizures and, ideally, make a surgical fix. But that means waiting around, sometimes for days, for a seizure to strike. “A lot of our patients are really bored. Sometimes when it goes past a couple of days, when you’re just stuck in a bed, they kind of welcome the research team to come in and break it up,” Chang says. That means they might play along with experiments. Chang got five people to say yes.
Brains don’t talk much, as a rule. But they’re not quiet, either—fizzing with message-carrying molecules among an uncountably complicated thicket of neurons. Still, despite the seeming ubiquity of functional magnetic resonance imagery in stories about “the part of the brain that controls X,” scientists don’t really know what’s going on in there. Functional MRI images actually blur spatially over relatively huge chunks of think-meat, and over several seconds of time. Very low resolution. Electroencephalograms take a faster snapshot, but of the entire brain at once. So neural interfaces like the ones Chang uses—deployed in the past to allow physically paralyzed people to control computers—offer an opportunity for more detailed “electrocorticography,” reading the activity of the brain more directly.
But how to translate an inner monologue to out-loud speech? Chang’s group does it in two steps. First they use a machine learning algorithm to sync up their recordings of the motor cortex as a person’s mouth moves with the acoustics of the words that movement produces. They use this to train a virtual mouth, essentially a simulation of mouth parts which they can then control with output from the BCI. Chang’s team recorded his five participants talking and electrocorticographically recorded their brains. Then he used those brain recordings to teach a computer to make sounds with a simulated mouth. The mouth produced speech, which listeners recruited on Amazon’s Mechanical Turk were mostly able to transcribe, roughly.
“This is currently a superhot topic, and a lot of very good groups are working on it,” says Christian Herff, a computer scientist at Maastricht University. His team similarly recorded motor cortex activity, but in people with their brains opened up on an operating table, awake and talking while waiting for surgery to remove tumors. Herff’s team went directly from the recordings to a machine-learning trained audio output, bypassing the virtual mouth. But it worked pretty well, too. Machine learning has gotten better, electrocorticography has improved, and computer scientists, linguists, and neurosurgeons are all collaborating on the science—leading to a minor boom in the field, Herff says.
Other approaches are chasing the same goal of turning brain activity directly into speech. In a paper earlier this year, a team at Columbia University showed it could generate speech using recordings from the auditory cortex—the part that processes sound—instead of the motor cortex. Right now, people who can’t physically make speech often have to use letter-by-letter technologies to spell out words, a much slower process than actual talking. These researchers would like to give those people a better option. “What approach will ultimately prove better for decoding imagined speech remains to be seen, but it is likely that a hybrid of the two may be best,” says Nima Mesgarani, the Columbia engineer who led that team.
The work is still preliminary, years away from widespread clinical or commercial use. The data set isn’t big enough to train a reliable model, for one thing. But the challenges run even deeper. “Right now this technique is limited to cases where we have direct access to the cortex. If we wanted to do this for the mass market, of course, opening the skull is not an option,” says Tanja Schultz, a computer scientist at the University of Bremen and an early innovator in the field (and Herff’s PhD advisor). Also, Schultz says, “the electrode montage on different patients is usually based on their medical requirements, so the positioning of the electrodes is never the same across patients … The second problem is that brains are not the same. In general, the motor cortex layout is similar across subjects, but it’s not identical.” That makes it hard to generalize the models that turn those signals into speech.
So for now, these good results are confined to people who have the ability to speak clearly but also happen to have their skulls cracked open. That’s not the planned use case, which might include people who’ve lost the ability to speak, or never had it. In them, no one knows if their motor cortices will still be able to send the signals that control the jaw, let’s say. “Whether or not the same algorithms will work in a population that cannot speak, that may only be able to be figured out in further steps in a clinical trial,” Chang said in a press briefing earlier this week. “There are some really interesting questions about how this will actually work in someone who’s actually paralyzed.”
Chang’s team compared brain output from people actually saying words and then just miming the words, and got similar results. But they don’t know whether their method would work in other languages besides English (there’s no reason it wouldn’t; the available set of physical articulations should be the same for all humans)—or whether it works if people just think about speaking without actually speaking, or think of words without saying them. That’s all down the road. But for now, at least, it seems like someday you might be able to speak your mind without saying a single word.