Who’s a Bot? Who’s Not?

By Siobhan Roberts

It sometimes seems that automated bots are taking over social media and driving human discourse. But some (real) researchers aren’t so sure.

Credit...Chris Gash

Over the long Memorial Day weekend, a Twitter storm blew in about bots, those little automatic programs that talk to us in the digital dimension as if they were human.

What first caught the attention of Darius Kazemi was the headline on an article from NPR, “Researchers: Nearly Half of Accounts Tweeting About Coronavirus Are Likely Bots” — which Hillary Clinton retweeted to her 27.9 million followers — and a similar headline from CNN.

Mr. Kazemi thought, “That seems like a lot.” An independent researcher and internet artist in Portland, Ore., and a 2018 Mozilla Fellow, Mr. Kazemi has spent considerable time studying the nature and behavior of bots. Stereotypically, bots run amok on social media, at Russia’s behest. Some would argue that there is a vast and often troublesome population of bots out there: In one recent paper — “What Types of Covid-19 Conspiracies Are Populated by Twitter Bots?” — the author noted that some bots were hijacking Covid-19 hashtags with disinformation and conspiracy hashtags, such as #greatawakening and #qanon.

But Mr. Kazemi thinks the bot plot against America is exaggerated.

There are major unknowns: How pervasive are nefarious bots, really? What is their real effect? Don’t they mostly tweet at each other? And, fundamentally, what is a bot? (For instance, sometimes it is difficult to tell a bot from a troll, which is an antagonistic human just spoiling for a fight, or a cyborg, which is a human-run account that intermittently deploys a bot.)

Mr. Kazemi also makes bots; he has been called “a deeply subversive, bot-making John Cage.” (His bot “Two Headlines” crawled Google News, picked two headlines at random and mashed up keywords on Twitter, for example: “ABBA crosses Korean border for summit.”) He defines a bot as “a computer that attempts to talk to humans through technology that was designed for humans to talk to humans.”

Skeptical of the “nearly half” claim, Mr. Kazemi found the source of the article, a news release from Carnegie Mellon University about the research of Kathleen Carley, director of the C.M.U. Center for Computational Analysis of Social and Organizational Systems; since January, Dr. Carley had collected more than 200 million tweets discussing the coronavirus or Covid-19. “We’re seeing up to two times as much bot activity as we’d predicted based on previous natural disasters, crises and elections,” she said in the release.

Mr. Kazemi had hoped to find a research paper, with data and code; no luck. “That was disheartening,” he said. Yoel Roth, Twitter’s head of site integrity, tweeted that the company had “seen no evidence to support the claim that ‘nearly half of the accounts Tweeting about #COVID19 are likely bots.’” He included a thread from the Twitter Communications team labeled “Bot or not?” that walked through the taxonomic nuances.

Dr. Carley said in an interview that she was reluctant to provide data before publication because she didn’t want to be scooped; she also didn’t want to violate Twitter’s terms of service. (The terms allow distribution of tweet and user I.D.s for peer-review or research validation, but the details can get complicated.)

ImageDarius Kazemi, researcher and internet artist, has spent considerable time studying the nature and behavior of bots.
Darius Kazemi, researcher and internet artist, has spent considerable time studying the nature and behavior of bots.Credit...Tojo Andrianarivo for The New York Times

“The last time we sent out a bot paper with the data at the same time, someone else stole our data and published our paper before we did,” Dr. Carley said. “Stuff will come out when it gets accepted for publication.” She added that she decided to share preliminary findings in response to queries from journalists and colleagues: “It seemed important that people knew about Covid-19. We thought we were doing a service.”

Scientific preprints are proliferating during the coronavirus pandemic, with researchers rushing to release timely results. And news outlets can be overzealous in jumping on results without a critical lens, much less analyzing the data. But the dearth of data was a red flag for Mr. Kazemi. He dug in with Twitter threads: Unless we posit that there are more bots than people out there on social media, he wrote, “there needs to be extremely good data to make a claim that half of all conversation about Covid-19 is from bots. The burden of proof is huge and not met.”

Others weighed in on Twitter as well. Kate Starbird, director of the Emergent Capacities of Mass Participation Laboratory at the University of Washington, asked: “Are automation & manipulation still a problem here? Yes. Should Twitter do better? Absolutely. But we researchers need to be precise in how we talk about different behaviors, including how we label ‘bots.’”

Brendan Nyhan, a professor of government at Dartmouth College, said: “Argh. What matters is the number of tweets people *see*. Bots can post infinity tweets into the ether. *Measure exposure not tweets.*”

Alex Stamos, director of the Stanford Internet Observatory, called it “L’Affair COVID Bots,” and noted, “Disinformation about disinformation is still disinformation, and is harmful to the overall fight.”

In early June, a similar story emerged about bot prevalence in the Twitter discourse around the protests over the killing of George Floyd. An article in Digital Trends reported that bots were spreading conspiracy theories and disinformation around the protests and the Black Lives Matter hashtag. The story cited Carnegie Mellon research indicating that 30 to 49 percent of accounts tweeting about the protests were bots.

These claims again raised skepticism and concern, from Mr. Kazemi and others.

Joan Donovan, research director of Harvard’s Shorenstein Center on Media, Politics and Public Policy, said that academics, when they release novel and shocking findings — whether publishing in a journal or by news release — have a responsibility to provide the evidence. “Dropping a statistic into the world without any explanation of what kind of content is attached is particularly troubling, especially related to the Black Lives Matter hashtag,” she said.

Dr. Carley, elaborating in a phone interview, said that she had a few ongoing social media projects, including studies on Covid-19 and the election. She uses a bot-detection tool developed at C.M.U. called Bot-hunter.

“I have said to everyone who has asked me, bots in and of themselves are not nefarious,” Dr. Carley said. “Bots are just software. They are used for good things, and they are used for bad things.”

She noted that of all the Black Lives Matter tweets collected so far in her research (bot and not), 90.6 percent were in support of the movement, 5.6 percent were not supportive, and the balance were neutral. The subset of bot tweets, she said, “did not appreciably affect those ratios” — bots were expressing overwhelming support for the protests, and often they were simply retweeting news, or rebroadcasting messages from the World Health Organization or Centers for Disease Control and Prevention.

Motivated by the headlines, Mr. Kazemi, in the intervening days, began a bot audit, manually inspecting data sets of suspected bots and verifying their existence in the wild. He focused on data used to train the machine learning algorithm that drives Botometer, a bot-detection tool by the Network Science Institute and the Center for Complex Networks and Systems Research at Indiana University, which “checks the activity of a Twitter account and gives it a score based on how likely the account is to be a bot.” A score of 0 is most humanlike, a score of 5 is most bot-like.

Other researchers do similar work. Manlio De Domenico, a physicist at the Bruno Kessler Institute in Trento, Italy, created the “Covid19 Infodemics Observatory,” which surveys about 4.5 million tweets daily. During the peer-review process for a paper, “Assessing the risks of ‘infodemics’ in response to Covid-19 epidemics,” his lab validated 1,000 user accounts. (The analysis took 12 people two weeks to conduct.)

Jonas Kaiser, of Harvard’s Berkman Klein Center for Internet & Society, and Adrian Rauchfleisch, of National Taiwan University, audited Botometer for their preprint paper, “The False Positive Problem of Automatic Bot Detection in Social Science Research.” Dr. Kaiser noted that algorithms are only as good as their training sets and generally perform worse when applied on unknown data.

“We found that the tool that is generally understood to be the ‘gold standard’ of the field is unreliable with its detection of bots, and it gets worse when tracking the bot classifications over time as well as for other languages,” Dr. Kaiser said.

Michael Kreil, a data journalist in Berlin, has been auditing bots since shortly after the 2016 U.S. election. Late last year he gave a talk titled, “The Army That Never Existed.” The précis: “‘Social bots have influenced elections. Does it sound plausible? Yes. Is it scientifically founded? Not at all.”

  • Updated June 12, 2020

    • So far, the evidence seems to show it does. A widely cited paper published in April suggests that people are most infectious about two days before the onset of coronavirus symptoms and estimated that 44 percent of new infections were a result of transmission from people who were not yet showing symptoms. Recently, a top expert at the World Health Organization stated that transmission of the coronavirus by people who did not have symptoms was “very rare,” but she later walked back that statement.

    • Touching contaminated objects and then infecting ourselves with the germs is not typically how the virus spreads. But it can happen. A number of studies of flu, rhinovirus, coronavirus and other microbes have shown that respiratory illnesses, including the new coronavirus, can spread by touching contaminated surfaces, particularly in places like day care centers, offices and hospitals. But a long chain of events has to happen for the disease to spread that way. The best way to protect yourself from coronavirus — whether it’s surface transmission or close human contact — is still social distancing, washing your hands, not touching your face and wearing masks.

    • A study by European scientists is the first to document a strong statistical link between genetic variations and Covid-19, the illness caused by the coronavirus. Having Type A blood was linked to a 50 percent increase in the likelihood that a patient would need to get oxygen or to go on a ventilator, according to the new study.

    • The unemployment rate fell to 13.3 percent in May, the Labor Department said on June 5, an unexpected improvement in the nation’s job market as hiring rebounded faster than economists expected. Economists had forecast the unemployment rate to increase to as much as 20 percent, after it hit 14.7 percent in April, which was the highest since the government began keeping official statistics after World War II. But the unemployment rate dipped instead, with employers adding 2.5 million jobs, after more than 20 million jobs were lost in April.

    • Mass protests against police brutality that have brought thousands of people onto the streets in cities across America are raising the specter of new coronavirus outbreaks, prompting political leaders, physicians and public health experts to warn that the crowds could cause a surge in cases. While many political leaders affirmed the right of protesters to express themselves, they urged the demonstrators to wear face masks and maintain social distancing, both to protect themselves and to prevent further community spread of the virus. Some infectious disease experts were reassured by the fact that the protests were held outdoors, saying the open air settings could mitigate the risk of transmission.

    • Exercise researchers and physicians have some blunt advice for those of us aiming to return to regular exercise now: Start slowly and then rev up your workouts, also slowly. American adults tended to be about 12 percent less active after the stay-at-home mandates began in March than they were in January. But there are steps you can take to ease your way back into regular exercise safely. First, “start at no more than 50 percent of the exercise you were doing before Covid,” says Dr. Monica Rho, the chief of musculoskeletal medicine at the Shirley Ryan AbilityLab in Chicago. Thread in some preparatory squats, too, she advises. “When you haven’t been exercising, you lose muscle mass.” Expect some muscle twinges after these preliminary, post-lockdown sessions, especially a day or two later. But sudden or increasing pain during exercise is a clarion call to stop and return home.

    • States are reopening bit by bit. This means that more public spaces are available for use and more and more businesses are being allowed to open again. The federal government is largely leaving the decision up to states, and some state leaders are leaving the decision up to local authorities. Even if you aren’t being told to stay at home, it’s still a good idea to limit trips outside and your interaction with other people.

    • If air travel is unavoidable, there are some steps you can take to protect yourself. Most important: Wash your hands often, and stop touching your face. If possible, choose a window seat. A study from Emory University found that during flu season, the safest place to sit on a plane is by a window, as people sitting in window seats had less contact with potentially sick people. Disinfect hard surfaces. When you get to your seat and your hands are clean, use disinfecting wipes to clean the hard surfaces at your seat like the head and arm rest, the seatbelt buckle, the remote, screen, seat back pocket and the tray table. If the seat is hard and nonporous or leather or pleather, you can wipe that down, too. (Using wipes on upholstered seats could lead to a wet seat and spreading of germs rather than killing them.)

    • The C.D.C. has recommended that all Americans wear cloth masks if they go out in public. This is a shift in federal guidance reflecting new concerns that the coronavirus is being spread by infected people who have no symptoms. Until now, the C.D.C., like the W.H.O., has advised that ordinary people don’t need to wear masks unless they are sick and coughing. Part of the reason was to preserve medical-grade masks for health care workers who desperately need them at a time when they are in continuously short supply. Masks don’t replace hand washing and social distancing.

    • If you’ve been exposed to the coronavirus or think you have, and have a fever or symptoms like a cough or difficulty breathing, call a doctor. They should give you advice on whether you should be tested, how to get tested, and how to seek medical treatment without potentially infecting or exposing others.

Defining the bot is a tricky problem; technically, it could be any automated account, like a news aggregator, or amplification software, like Hootsuite. Mr. Kazemi found many bots tweeting about Covid-19, including neighborhood health clinics using marketing software to post daily pandemic P.S.A.s about washing your hands.

He also found that humans were often mistaken for bots. Consider the “grandpa effect,” as he called it: people who were mistaken for bots because they used social media in “uncool or gauche” ways, he said. Users fond of hitting the share button on news articles also resulted in false positives. This led Mr. Kazemi to wonder whether Botometer should be renamed “Normiemeter.” He tweeted: “Can you imagine the headlines? ‘50% of accounts tweeting about Covid are normies.’”

There was also normal fandom behavior, such as the progressive K-pop fans who overwhelm social media algorithms to get topics trending — they rallied around the Black Lives Matter movement. There were burner accounts of people engaging with porn and following lots of accounts, with few or zero followers. And there was a black South-African woman who liked to respond with walls of congratulatory emojis whenever she saw other black women succeeding in their careers.

One morning on Twitter, Mr. Kazemi put out a call for bot sightings, and he asked people what made them think they had spotted a bot. About half the respondents cited the Twitter handles with multi-digit suffixes, like @Darius98302127. But as Mr. Kazemi himself recently learned, new users (since at least late 2017) are not initially given the option of choosing a username; they are automatically assigned a numerically original handle, which many don’t bother to change. For the other respondents, the term “bot” was a slur — shorthand for, “I don’t agree, and I think this position that the other person holds is so outrageous that it couldn’t possibly be held in good faith by a human.”

The problem of what is or what is not a bot may be too slippery to solve — in part because bots are continually evolving. As Mr. Kazemi noted, “It’s a bit like when Supreme Court Justice Potter Stewart famously said of pornography, ‘I know it when I see it’” — which, Mr. Kazemi added, is not an ideal strategy.

The more important and perhaps even more difficult issue is how to measure the impact of bots on the collective discourse. Do bots change our beliefs and behaviors?

“We want to understand what type of susceptible populations engage with them and what types of narratives resonate,” said Emilio Ferrara, a computer scientist at the University of Southern California and the author of the “Covid-19 Conspiracies” bots paper. The holy grail of bot research, he said, is to understand whether bots matter.

“Many people would agree that, yeah, maybe there are tons of bots,” he said. “But if nobody cares about them — maybe they get suspended right away and not a large share of the audience sees their content — it’s less problematic.”

Sarah Jackson, an associate professor at the Annenberg School for Communication at the University of Pennsylvania, said that it was more important to focus on where the bots are in networks and with whom they interact. Dr. Jackson is a co-author, with Moya Bailey and Brooke Foucault Welles of Northeastern University, of the book, “#HashtagActivism, Networks of Race and Gender Justice.” Studying dozens of #BlackLivesMatter networks, the authors found that spam and delegitimizing bots were almost always on the periphery, interacting with very few real people.

“So, even if there are a lot of bots in a network, it is misleading to suggest they are leading the conversation or influencing real people who are tweeting in those same networks,” Dr. Jackson said.

But bots have also been adopted by organizations and activists in social movements as effective vehicles for catalyzing change. Dr. Jackson pointed out that bot-detection algorithms flag what might be considered atypical human behavior: People don’t typically tweet 24 hours a day, or 1,000 times an hour, or create new accounts only to delete them once they amass a following. “But these are all normal and expected behaviors for people documenting protest activities,” she said.

And as Mr. Kazemi observed in one of his threads describing another class of false positives: “You know who uses Twitter in a way that the vast majority of people who hold Ph.D.s do not? Disenfranchised populations.”

Meanwhile, the self-identifying “Galaxy Brain Bot” — his favorite bot of 2020 — scores a mere 1.8 on Botometer.