Machine learning sucks at covid (permalink)
The worst part of machine learning snake-oil isn't that it's useless or harmful – it's that ML-based statistical conclusions have the veneer of mathematics, the empirical facewash that makes otherwise suspect conclusions seem neutral, factual and scientific.
Think of "predictive policing," in which police arrest data is fed to a statistical model that tells the police where crime is to be found. Put in those terms, it's obvious that predictive policing doesn't predict what criminals will do; it predicts what police will do.
Cops only find crime where they look for it. If the local law only performs stop-and-frisks and pretextual traffic stops on Black drivers, they will only find drugs, weapons and outstanding warrants among Black people, in Black neighborhoods.
That's not because Black people have more contraband or outstanding warrants, but because the cops are only checking for their presence among Black people. Again, put that way, it's obvious that policing has a systemic racial bias.
But when that policing data is fed to an algorithm, the algorithm dutifully treats it as the ground truth, and predicts accordingly. And then a mix of naive people and bad-faith "experts" declare the predictions to be mathematical and hence empirical and hence neutral.
Which is why AOC got her face gnawed off by rabid dingbats when she stated, correctly, that algorithms can be racist. The dingbat rebuttal goes, "Racism is an opinion. Math can't have opinions. Therefore math can't be racist."
You don't have to be an ML specialist to understand why bad data makes bad predictions. "Garbage In, Garbage Out" (GIGO) may have been coined in 1957, but it's been a conceptual iron law of computing since "computers" were human beings who tabulated data by hand.
But good data is hard to find, and "when all you've got is a hammer, everything looks like a nail" is an iron law of human scientific malpractice that's even older than GIGO. When "data scientists" can't find data, they sometimes just wing it.
This can be lethal. I published a Snowden leak that detailed the statistical modeling the NSA used to figure out whom to kill with drones. In subsequent analysis, Patrick Ball demonstrated that NSA statisticians' methods were "completely bullshit."
Their gravest statistical sin was recycling their training data to validate their model. Whenever you create a statistical model, you hold back some of the "training data" (data the algorithm analyzes to find commonalities) for later testing.
So you might show an algorithm 10,000 faces, but hold back another 1,000, and then ask the algorithm to express its confidence that items in this withheld data-set were also faces.
However, if you are short on data (or just sloppy, or both), you might try a shortcut: training and testing on the same data.
There is a fundamental difference from evaluating a classifier by showing it new data and by showing it data it's already ingested and modeled.
It's the difference between asking "Is this like something you've already seen?" and "Is this something you've already seen?" The latter tests whether the system can recall its training data; the former tests whether the system can generalize based on that data.
ML models are pretty good recall engines! The NSA was training it terrorism detector with data from the tiny number of known terrorists it held. That data was so sparse that it was then evaluating the model's accuracy by feeding it back some of its training data.
When the model recognized its own training data ("I have 100% confidence this data is from a terrorist") they concluded that it was accurate. But the NSA was only demonstrating the model's ability to recognize known terrorists – not accurately identify unknown terrorists.
And then they killed people with drones based on the algorithm's conclusions.
Bad data kills.
Which brings me to the covid models raced into production during the height of the pandemic, hundreds of which have since been analyzed.
There's a pair of new, damning reports on these ML covid models. The first, "Data science and AI in the age of COVID-19" comes from the UK's Alan Turing Institute:
The second, "Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans," comes from a team at Cambridge.
Both are summarized in an excellent MIT Tech Review article by Will Douglas Heaven, who discusses the role GIGO played in the universal failure of any of these models to produce useful results.
Fundamentally, the early days of covid were chaotic and produced bad and fragmentary data. The ML teams "solved" that problem by committing a series of grave statistical sins so they could produce models, and the models, trained on garbage, produced garbage. GIGO.
The datasets used for the models were "Frankenstein data," stitched together from multiple sources. The specifics of how that went wrong are a kind of grim tour through ML's greatest methodological misses.
- Some Frankenstein sets had duplicate data, leading to models being tested on the same data they were trained on
A data-set of health children's chest X-rays was used to train a model to spot healthy chests – instead it learned to spot children's chests
One set mixed X-rays of supine and erect patients, without noting that only the sickest patients were X-rayed while lying down. The model learned to predict that people were sick if they were on their backs
A hospital in a hot-spot used a different font from other hospitals to label X-rays. The model learned to predict that people whose X-rays used that font were sick
Hospitals that didn't have access to PCR tests or couldn't integrate them with radiology data labeled X-rays based on a radiologist's conclusions, not test data, incorporating radiologist's idiosyncratic judgements into a "ground truth" about what covid looked like
All of this was compounded by secrecy: the data and methods were often covered by nondisclosure agreements with medical "AI" companies. This foreclosed on the kind of independent scrutiny that might have caught these errors.
It also pitted research teams against one another, rather than setting them up for collaboration, a phenomenon exacerbated by scientific career advancement, which structurally preferences independent work.
Making mistakes is human. The scientific method doesn't deny this – it compensates for it, with disclosure, peer-review and replication as a check against the fallibility of all of us.
The combination of bad incentives, bad practices, and bad data made bad models.
The researchers involved likely had the purest intentions, but without the discipline of good science, they produced flawed outcomes – outcomes that were pressed into service in the field, to no benefit, and possibly to patients' detriment.
There are statistical techniques for compensating for fragmentary and heterogeneous data – they are difficult and labor-intensive, and work best through collaboration and disclosure, not secrecy and competition.
Podcasting "Are We Having Fun Yet?" (permalink)
This week on my podcast, I read the first three parts of "Are We Having Fun Yet?" my ongoing Medium series on what we can learn about aggregate demand management and scarcity from the history of queues at Disney theme parks.
Part I: Are We Having Fun Yet?
Part II: Boredom and its discontents
Part III: Now you’ve got two problems
I also published a fourth item in the series yesterday, "Managing aggregate demand," about auctions, queuing, fairness and antitrust.
New installments will go up on Sundays – next is "Expectations management."
The episode is here:
Here's a direct link to the MP3 (hosting courtesy of the Internet Archive; they'll host your stuff for free, forever):
And here's a direct RSS link to subscribe to the podcast:
This day in history (permalink)
#20yrsago Vernor Vinge in the NYT https://www.nytimes.com/2001/08/02/technology/a-scientist-s-art-computer-fiction.html
#20yrsago Bruce Perens on Dmitry Sklyarov https://web.archive.org/web/20011023092940/https://www.zdnet.com/zdnn/stories/comment/0,5859,2800985,00.html
#15yrsago Five things about blogs that no one ever needs to say again https://stevenberlinjohnson.com/five-things-all-sane-people-agree-on-about-blogs-and-mainstream-journalism-so-can-we-stop-talking-ea0f75b3c163
#10yrsago Missouri State business-school professor leads successful campaign to ban Slaughterhouse-Five from local schools https://www.theguardian.com/books/2011/jul/29/slaughterhouse-five-banned-us-school
#10yrsago Gingrich’s million Twitter followers: “80% dummy accounts, 10% paid followers” https://www.gawker.com/5826645/most-of-newt-gingrichs-twitter-followers-are-fake
#10yrsago Wisconsin Democratic voters targeted with Koch-funded absentee ballot notices advising them to vote 2 days after the recall election https://www.politico.com/blogs/david-catanese/2011/08/afp-wisconsin-ballots-have-late-return-date-037977
#10yrsago Castles made from human hair https://inhabitat.com/artist-uses-human-hair-to-construct-a-castle-of-3000-bricks/
#5yrsago Australian media accessibility group raises red flag about DRM in web standards https://web.archive.org/web/20160809062641/www.accessiq.org/news/news/2016/08/concerns-raised-for-assistive-technology-development-as-w3c-debates-encrypted
#5yrsago Ireland (finally) jails three bankers for role in 2008 crisis https://www.reuters.com/article/us-ireland-banking-court-idUSKCN10912E
#5yrsago Reminder: the GOP has been attacking veterans and their families for years https://crookedtimber.org/2016/08/02/trumps-indecent-proposal/
#1yrago Galaksija https://pluralistic.net/2020/08/02/ventilator-202/#Galaksija
Today's top sources: Slashdot (https://slashdot.org/).
- Spill, a Little Brother short story about pipeline protests. Friday's progress: 251 words (12411 words total)
A Little Brother short story about remote invigilation. PLANNING
A nonfiction book about excessive buyer-power in the arts, co-written with Rebecca Giblin, "The Shakedown." FINAL EDITS
A post-GND utopian novel, "The Lost Cause." FINISHED
A cyberpunk noir thriller novel, "Red Team Blues." FINISHED
Currently reading: Analogia by George Dyson.
Latest podcast: Tech Monopolies and the Insufficient Necessity of Interoperability https://craphound.com/news/2021/07/12/tech-monopolies-and-the-insufficient-necessity-of-interoperability/
- The Shakedown, with Rebecca Giblin, nonfiction/business/politics, Beacon Press 2022
This work licensed under a Creative Commons Attribution 4.0 license. That means you can use it any way you like, including commercially, provided that you attribute it to me, Cory Doctorow, and include a link to pluralistic.net.
Quotations and images are not included in this license; they are included either under a limitation or exception to copyright, or on the basis of a separate license. Please exercise caution.
How to get Pluralistic:
Blog (no ads, tracking, or data-collection):
Newsletter (no ads, tracking, or data-collection):
Mastodon (no ads, tracking, or data-collection):
Medium (no ads, paywalled):
(Latest Medium column: "Managing aggregate demand," part four of a series on themepark design, queing theory, immersive entertainment, and load-balancing. https://doctorow.medium.com/managing-aggregate-demand-part-iv-8d2022a5125b)
Twitter (mass-scale, unrestricted, third-party surveillance and advertising):
Tumblr (mass-scale, unrestricted, third-party surveillance and advertising):
"When life gives you SARS, you make sarsaparilla" -Joey "Accordion Guy" DeVilla