Video chat has every appearance of being a solid win for human communication. It’s useful for companies with far-flung employees and for distant friends and family. With social media becoming ever more fractious, it seems like seeing and talking to a real person on video chat offers some hope for maintaining some humanness in our online conversations.
But here’s the thing: The newest innovations in video chat are making conversations between groups worse by combining the bleakest of online and real-life worlds. The two biggest providers of free group video chat, Google and Apple, have built software reinforcing some of the obnoxious dynamics of real-life group meetings — including sexism and racism.
The idea, in theory, is that it makes it possible to focus on the person speaking. But it also rewards the loudest person in the room.
One of the main features coded into Google Hangouts and Apple’s new Group FaceTime is that when someone is talking, their image becomes really big. Those who aren’t talking become small. The idea, in theory, is that it makes it possible to focus on the person speaking.
But it also rewards the loudest person in the room — echoing one of the dreariest problems that already infects everyday face-to-face groupings. You know from any meeting, conference panel, or late-night drinks that the loudest person in the room is rarely dropping the biggest pearls of wisdom or bringing the group together. Group FaceTime and Hangouts exacerbate this problem by making the loudmouth’s face loom large on-screen for as long he’s yammering.
The other problem with software rewarding the loudest talkers is that women and marginalized folks tend to be penalized for speaking up. What is the price of being loud if you’re a woman of color? What does it mean to be perceived as loud? What’s the penalty if you’re too loud?
This reward-the-loudmouth design ignores everything we know about how truly great conversation actually works. Rich conversation thrives on what’s known as “turn taking,” with the group fluidly moving its attention from speaker to speaker. In FaceTime and Hangouts, the people in the group don’t have any control over where they put their focus. That’s determined automatically by software.
In theory, video chat could increase our understanding of each other online because it allows so much more facial and verbal affect. If someone’s being sarcastic, for example, it should be much clearer on video chat than it is on Twitter.
Yet again, big tech firms are rolling out features that screw up this natural dynamic by obscuring our actual faces. Apple has enthusiastically dedicated engineering resources to create “animoji,” allowing speakers to mask their faces with a cartoon koala or a talking fox. Did your project manager give you side-eye while you nervously attempted your first talk? You’ll never know, because instead of his real face, all you saw was a giant bunny head.
We could design software that encourages not just talking but also listening.
Emojis were created to substitute for what our faces and voices generally do in the first place: communicate how we feel and what we want to say. Now, just as technology reaches the point where video chats enable us to communicate with our real faces in real time, Apple decides to encourage us to communicate more like a machine.
Animojis won’t help you understand or relate with other people. They’ll let you hide your real face and expressions from others. What animojis can do is help you feel less vulnerable and seem less emotional. Is this what’s ultimately driving the makers’ decisions? And what does that say about the people who designed them?
If group video chat is heading down the wrong path, what would be a better one? How could you design group chat so that it makes conversation better and encourages everyone to speak up?
Here’s one big idea: We could design software that encourages not just talking but also listening.
Encouraging listening is incredibly powerful. I’ve spent decades in Silicon Valley (full disclosure: I worked at Apple as part of its early web and webcasting team) and applied ideas from the web, comedy, and organizing to create many in-person conversations as a performer and teacher of public speaking. I believe the single most important element in aiding someone to speak in a group is the feeling of being listened to with interest.
When you see a decent discussion panel at a conference (these miracles have happened), an attentive and engaged audience is more important to the quality of the debate than the loudest person talking. Feeling listened to and heard and welcome makes it more likely that someone will speak up. Quality conversation is built on how a speaker listens to and adds onto what previous speakers have said and on whether the speakers understand the issue and the perspectives of the other participants.
Let’s think about what this would look like as a piece of software design. What if the face you saw the most was that of the person who is most interested in what you have to say or the person you felt listened the most attentively? What if we had UX choices focused on feeling listened to that used machine learning to give positive feedback? What if tools rewarded good connections between people or people who made excellent points succinctly that resonated well with the group? And what if software teams began with deeper awareness and understanding of racism and misogyny and cultural difference in conversation and set goals of inclusion and respect?
Machines simply don’t do people stuff well.
When you’re onstage speaking to a large group, you can choose one person to focus on — you can see when they have a thoughtful expression, when they are paying attention or being quiet, and you can work to draw them out. These moments — where the main speaker notices and calls on someone — can give the large group a stronger conversational sense and help the way you speak be more natural and intimate. The quietest person in the room often has the greatest impact when they speak up, if space is made for them. Their input can be transformative for a conversation.
What if you could do the same in a group video chat? Group chat features could be trained to recognize interest and attention and reward choices made based on paying attention to others or increasing group involvement. What if a software company designed group video chat with the presumption that a person, not software, would lead the interpersonal decisions and conversation?
One of the weaknesses of Silicon Valley is to place machines at the center of challenges, and something as nuanced and impactful as human communication is the perfect example. The ability to make good judgments about interpersonal interactions is affected by a huge number of biological abilities and learned skills — and machines simply don’t do people stuff well. The people who work in Silicon Valley operate in a culture that reveres computers and programming, rather than focusing on our humanness. So what we have now are tools that attempt to erase human expression and impede natural interaction.
Instead, software design should be informed by expertise in sociology and psychology, in our understanding of emotion, bias, and interaction, so that we build the communication tools we deserve.
This sort of design is possible—we’ve seen glimpses of it before. Seesmic was an early Web 2.0 platform for asynchronous threaded messages, but it used small bits of video conversation rather than text. Its approach was centered around users and conversations rather than the tech itself, and because it hired hosts to oversee and shape conversations, the space encouraged a high level of quality interaction. Conversations were organized around a central listener — not a central talker.
How do we talk with each other, rather than at each other? Each of us needs to see and be seen. We need to feel understood, because that is what allows for intimacy and meaningful connection. That’s hard enough to do in person, and it shouldn’t be harder online.