It’s been a couple of months, and I still can’t talk to Alexa. My Amazon Echo Dot, the hockey puck-shaped smart speaker that emits a pale blue glow when asked a question or given an order, sits dormant in my office. It’s not because I dislike talking to machines. I’m a dictation expert on my phone—incorporating the words “period” and “comma” into my sentences, and saying “ha ha” in a staccato way that is the exact opposite of laughing—the product of years of idea-saving and communicating while walking or driving. But when I opened my Echo earlier this summer, I found a small list of sample questions, like a tourist’s starter guide to navigating a foreign country. While I understand that part of using a smart speaker is the simple novelty of giving orders to a sentient piece of plastic, I don’t know if I’m ready to use voice commands for their own sake; Alexa is programmed to answer questions I don’t know that I need to ask.
For instance, as someone who actually enjoys sorting streaming tracks into gargantuan playlists and maintaining a physical record collection, I don’t need assistance listening to music, which is pitched as one of the Echo’s primary tasks. At the same time, a lot of my music-loving friends have fallen for the Echo. One of them who grew up listening to AM signals through a transistor radio uses one of his Echoes (he has four) for background radio listening in the kitchen, as his wife listens to NPR through another one. Another friend taught her Echo to play ambient music while she sleeps. The people who use the Echo tend to really like the Echo. I can’t help but wonder if, eventually, I’ll be an Echo person too.
Almost four years after the Echo and Alexa’s rather inauspicious debut—“The whole thing is a tad baffling, but also intriguing,” wrote TechCrunch at the time—smart speakers are now teetering on omnipresence. Sales tripled between 2016 and 2017, and analysts expect nearly 60 million units will be bought globally this year. According to a study by NPR and Edison Research, 39 million Americans—16 percent of the country—owned a smart speaker in January 2018. Though Amazon still has the market cornered, Google launched its Home speaker in November 2016, and Apple’s Siri-run HomePod was released this February with a higher price and the promise of superior audio quality. The smart speaker marketplace grows more crowded by the day: Microsoft’s virtual assistant Cortana has found a home in a Harman Kardon speaker, while Samsung’s Bixby will reportedly debut in speaker form later this year. Sonos, Panasonic, and Sony are joining the fray as well. Would you buy a will.i.am-branded smart speaker? He hopes so.
Unsurprisingly, streaming music is proving to be these appliances’ killer app. NPR and Edison report that 60 percent of users surveyed asked their smart speakers to “play music” while spending time with others, making it easily the most popular function, ahead of answering general questions (mentioned by 30 percent of respondents) and getting the weather (28 percent). The listening isn’t purely random, either: A recent report notes that nearly half of smart speaker owners pay for a monthly streaming subscription, a number that is predicted to rise. At a recent British music industry meeting, smart speakers were compared to the Apple’s iPod and App Store launches in terms of their possible effects on multiple industries. Three of the most valuable technology companies in the world are deploying interactive speakers to draw listeners to their branded music platforms: The next battle in the corporate streaming music war will be fought with voice.
The smart speaker is the product of decades of experimentation with voice recognition and domestic networking that has been made possible, as have so many recent innovations, by massive companies wielding incredible amounts of computing power. Alexa, Siri, and the other artificially intelligent, voice-recognizing (and always female) domestic robo-agents have roots in Bell Labs’ fledgling 1950s experiments with “Audrey,” but their capacity to recognize conversational speech patterns and interact with their owners in a naturalistic way situates them within the ongoing evolution of interactive AI, which once terrified us but now turns us on. These devices’ roles in organizing the mundane duties of domestic life is part of a much broader campaign to network the entire home into a smoothly operating, data-rich whole: Echo can adjust your home’s thermostat and lock your doors, just like Google Home fits into its Nest system, and Apple’s HomePod dialogues with its HomeKit. Freely accessible digital music has been compared to a household utility—like water out of the tap, always available—for years, and with smart speakers, it’s now controllable by the same device that dims your lights.
Digital music files themselves have been remade as “smart” objects for the past several years—“smart” being the latest unavoidable tech buzzword describing technologies that promise to improve experience through mild surveillance. By corralling files into platforms, Spotify, Apple Music, Tidal and their ilk have transformed the simple act of clicking play into a value-generating activity. Streaming songs aren’t exchangeable commodities like they are on CD, vinyl, or even MP3; instead, they’re pleasurable spyware, reporting back copious amounts of proprietary data on listeners (which, the companies promise, is then routed back into an ever-more-personalized and enjoyable user experience). When Spotify CEO Daniel Ek told The New Yorker that his company isn’t in the music space, but the moment space, he was implying that the experience is the commodity—not music, but everyday activities tuned to Spotify’s algorithms and curated playlists. Smart speakers nestle perfectly into a digital music landscape colonized by streaming platforms, the better to curate each activity as a meaningfully soundtracked moment.
Tech designers and engineers look at the world as a set of problems to efficiently, if not artfully, solve. Within certain corners of the digital music space, those problems manifest as barriers to a seamless listening experience—to experiencing streaming music as an atmospheric hum capable of instantaneously accommodating any mood, activity, or nostalgic pang. This is what Amazon Music director Ryan Redington is getting at when he tells me that “voice almost completely removes friction for getting the music quickly.” As an example, Redington describes how he uses music to shift into domestic mode after work. “I used to get home, take out my phone, unlock it, find Amazon Music, find a playlist that I want to listen to, connect to Bluetooth or a receiver in my house, then start playing music,” he explains. With a smart speaker, he claims, all that technological friction disappears. “Now I can just walk in my house, say, ‘Alexa, play’ whatever I want to listen to, and it just works.”
The Echo was not designed explicitly for music, but it was no coincidence that Amazon launched Prime Music, its free service for Amazon Prime members, a few months before the Echo was introduced to the world. (Amazon Music Unlimited, which features millions more tracks and was launched as as a direct competitor to Spotify and Apple Music, debuted in 2016.) “I wouldn’t go as far as to say that [the Echo and Amazon Prime Music] were developed together,” Redington tells me, “but certainly, we knew that this device was being worked on, [and built] our music service to make sure it was very voice-forward.” While Spotify distinguishes itself with personally curated playlists, and Tidal and Apple Music offer artist exclusives on their platforms, Amazon Music hopes to separate itself with voice.
Though its competitors will no doubt catch up quickly, to date Amazon has done far more to integrate streaming music with voice commands. This is a realm that, to put it lightly, can differ starkly from the more familiar process of typing a question into a visual interface. “We are very much down in the weeds on understanding exactly what words customers are using when they ask for something,” explains Alex Luke, Amazon’s global head of programming and content strategy. “What does Alexa say back in response to that utterance, and then what music do we deliver after Alexa says her response?”
Indeed, one of the most significant issues for smart speaker engineers to address is what might be called the single-response problem. “In voice,” Redington explains, “you don’t have the luxury to give customers a lot of results—you have to start playing something.” Unlike a visual interface that can provide a screen full of sorted responses to a question for the user to select from, Alexa can only provide one answer at a time—otherwise there’s friction. In the smart speaker world, getting the right answer first is key. As Redington puts it, “When you ask for something and it works, that’s truly where the magic happens.”
As with all streaming music, the “magic” emerges from the metadata. In a platformed music environment, each individual track is appended with copious digital information that determines where and how it should circulate, from codes that track sales and streams to musical and activity information. Though any streaming platform user is deeply familiar with mood and activity-geared playlists, the frictionless domestic landscape of voice commanded speakers has led to a surge in such requests. “When people say, ‘Alexa, play me happy music,’ that’s something we never saw typed into our app, but we start to see happening a lot through the voice environment,” Redington explains.
While all platforms have teams creating reams of metadata through machine learning techniques and human curation that can determine if a song is “happy,” record labels understandably want to have a say as well. Will Slattery is the global digital sales manager for Ninja Tune, an electronic label that, translated into streaming language, features a lot of lyric-less music that lends itself toward specific moods and activities. “When people start interacting with smart speakers, they’re going to want to say, ‘Alexa, play some chill music,’ or ‘play music for dinner,’” Slattery predicts. “And that’s where a label could jump in and provide the [streaming] companies with that metadata, like, ‘This would be a good song for these specific moods.’” Ninja Tune artist Bonobo, Slattery notes, is very popular on study and concentration playlists—something the producer doesn’t take into account when composing his music, but which he can’t deny once it’s in circulation. “It is strange to imagine an artist hoping they someday get their music on fitness playlists,” as opposed to getting a rave review or a plum Coachella slot, one indie label owner tells me. “But this will change fast. What seems like a slightly absurd way to approach music today will be commonplace tomorrow.”
Smart speakers are already making inroads into the most currently commonplace listening mode: broadcast radio. From Pandora to Beats 1, the short history of streaming platforms has been marked by mimicry of radio’s free, passive mode of music circulation. Amazon’s Luke started his career as a radio program director, for massive rock stations like Chicago’s Q101 and Dallas’ The Edge, and his early programming initiatives for the Echo were clearly drawn from this experience. Last November, Echo owners who said, “Alexa, play the U2 Experience,” were dropped into a live broadcast that mixed tracks from 2017’s Songs of Experience with band interviews. U2 described it on their website as a “new type of radio.” Asking Alexa to “play The Soundboard” earlier this year cued up a live, career-spanning Elton John program. Daily programs like “Today in Music” and “Song of the Day,” which launched complete with their own specific voice commands, also suggest the strong influence of broadcast radio programming’s liveness.
All this makes the radio industry nervous, and with good reason. The NPR/Edison study reports that 39 percent of smart speaker users are now spending time listening to these devices rather than broadcast radio. When Amazon isn’t replicating radio programming and simulating its experience, the company is relying on its role in the promotional ecosystem to accommodate voice-specific requests. When Redington noticed that Echo users were asking for “the latest song” by an artist, he realized that simple release date metadata wasn’t enough to serve up the proper result. “We actually had to understand which song is being played at radio, so radio impact date became really important to us.”
When Australian indie rocker Courtney Barnett was featured on Amazon’s “Today in Music” program, Jessica Page, the director of digital at Barnett’s label Mom + Pop, says that Barnett’s sales and streams increased immediately. Amazon’s built-in listening audience is significant: One research firm put Amazon Music’s subscription numbers at 16 million last October, good enough for third place behind Spotify and Apple, and a source told Variety in March that those numbers are steadily climbing, and increasingly dependent on Echo integration. Page notes that it’s too early to tell if this kind of placement can drive the same sort of visibility or engagement as, say, a spot near the top of a prestige Spotify playlist. But until Apple, Google, and others launch their own specific programming initiatives for voice, Amazon is, in Page’s words, “another, for lack of a better term, box you can check for more visibility.”
For Page, as with other digital strategists at record labels, the smart speaker world has been built independent of their wishes, and it’s up to them to make it work for their roster. This strategy increasingly involves the incorporation of lyrics-as-metadata. During the digital era, song lyrics have re-emerged as a commodity in their own right. The crowdsourced platform Genius has integrated with Spotify and Google Home, while Toronto-based LyricFind, which dubs itself as “the world’s leader in legal lyric solutions,” operates on the backend, licensing lyrics from music publishers to work with Pandora, Deezer, and Microsoft platforms, among others. It seems that lots of smart speaker users are requesting songs via snatches of overheard lyrics, which requires a new level of metadata specificity. Lyric copyrights are typically owned by music publishers, and labels don’t usually make any money from lyric licensing. But the promise of smart speaker integration, LyricFind founder Darryl Ballantyne says, has triggered a shift: Any metadata element that could help their music to rise above the digital din helps. “Even though the labels aren’t getting paid by us, having the lyrics available gets them paid more from other people,” Ballantyne says. Page agrees: “The ability to find lyrics and match them with a song will lead to more streams and more sales.”
While labels are playing catch-up with lyrics-as-streaming-metadata, the technology companies are pitching their products toward the type of music fan who would ask for “the hipster song with the whistling.” Amazon, Apple, and Google aren’t going to sell millions of smart speakers by aiming their products toward music obsessives, especially when casual fans are much more amenable to algorithmic programming. This raises old issues for smaller players in the music industry, though. “For an indie label, the question always is: How are you going to convert listeners who are just going to say ‘play some music’ and get them to listen to music they haven’t heard before?” says Ninja Tune’s Slattery. The answer he ventures—even more algorithmic “discovery” engineered by the platforms themselves—isn’t the most anxiety-soothing. “As a label, you’re at the mercy of infrastructure created by tech companies,” he confesses.
Indeed, many of the most pressing issues of the streaming music economy—artist compensation, statistical transparency, sexism—remain untouched, if not deepened, by the rise of the smart speaker. Moreover, as Amazon, Apple, and Google continue to carve out their spaces in the voice marketplace, music consumers and musicians alike will continue to fight against the companies’ preferred walled-garden approach to exclusivity. And though there’s no real reason to sympathize with Tidal or Spotify, the idea that the smart speaker industry might become the exclusive province of massive firms with enough capital to experiment (and huge captive audiences to use as guinea pigs) is significant reason for pause, no matter how little one is interested in owning the devices. A world in which three of tech’s “frightful five” become the equivalent of the major labels, with exclusive holdings in hardware and software, and plenty of incentive to lock competitors’ products and content out of their systems, is a chilling idea, and not as far-fetched as it might seem.
Most music fans don’t automatically want to use a smart speaker to listen to music. They have to be trained to interact with virtual assistants, in the same way that they had to learn to swipe instead of type. The list of sample questions and commands that comes with every smart speaker does not simply tell you how to use the thing, but how to interact with it, employing something close to natural language. For years, speech recognition researchers have understood that talking to voice interface requires the same psychological and social resources as other forms of speech. They’re also uniformly female voices, which activates the trope of subservient women that far predates recorded music.
Beyond the clunky voice interfaces that could only understand robotic utterances a few years ago, smart speakers facilitate a more informal, even humorous level of human-computer interaction. Like the “What’s happening?” prompt on Twitter, or Facebook’s “What’s on your mind?”, this kind of performed intimacy smooths over the bigger project of continuous, ambient data collection. Smart speakers may or may not revolutionize the recorded music industry, but these personable gadgets seems designed for a much bigger project, for which music might merely provide an enjoyable entry point: generating goodwill not toward faceless corporations, but to the dulcet voice in the living room promising a world of constant, frictionless, surveilled consumption.