The era of blind faith in big data must end

The era of blind faith in big data must end

This is a summary by Irene Guidarelli of a speech delivered by Cathy O’Neil in April 2017 in a TED Conference. Cathy is a mathematician, she runs the blog mathbabe.org and she’s the author of several books including the latest “Weapons of Math Destruction“.

Algorithms are everywhere. They sort and separate the winners from the losers. The winners get the job or a good credit card offer. The losers don’t even get an interview or they pay more for insurance.We’re being scored with secret formulas that we don’t understand that often don’t have systems of appeal. That begs the question: What if the algorithms are wrong?

To build an algorithm you need two things: you need data, what happened in the past, and a definition of success, the thing you’re looking for and often hoping for. You train an algorithm by looking, figuring out. The algorithm figures out what is associated with success. What situation leads to success? Actually, everyone uses algorithms. They just don’t formalize them in written code.

Algorithms are opinions embedded in code. People think that algorithms are objective and true and scientific. That’s a marketing trick. It’s also a marketing trick to intimidate you with algorithms, to make you trust and fear algorithms because you trust and fear mathematics. A lot can go wrong when we put blind faith in big data.

Kiri Soares is a high school principal in Brooklyn. In 2011, she told me her teachers were being scored with a complex, secret algorithm called the “value-added model.” I told her, “Well, figure out what the formula is, show it to me. I’m going to explain it to you.” She said, “Well, I tried to get the formula, but my Department of Education contact told me it was math and I wouldn’t understand it.” It gets worse. The New York Post filed a Freedom of Information Act request, got all the teachers’ names and all their scores and they published them as an act of teacher-shaming. When I tried to get the formulas, the source code, through the same means, I was told I couldn’t. I was denied. I later found out that nobody in New York City had access to that formula. No one understood it. Then someone really smart got involved, Gary Rubinstein. He found 665 teachers from that New York Post data that actually had two scores. That could happen if they were teaching seventh grade math and eighth grade math. He decided to plot them. Each dot represents a teacher.That should never have been used for individual assessment. It’s almost a random number generator.

I know what a lot of you guys are thinking, especially the data scientists, the AI experts here. You’re thinking, “Well, I would never make an algorithm that inconsistent.” But algorithms can go wrong,even have deeply destructive effects with good intentions. And whereas an airplane that’s designed badly crashes to the earth and everyone sees it, an algorithm designed badly can go on for a long time, silently wreaking havoc.

What would happen if we replaced the hiring process with a machine-learning algorithm? What would the data be? A reasonable choice would be the last 21 years of applications.What about the definition of success? Reasonable choice would be someone who stayed there for four years and was promoted at least once. And then the algorithm would be trained. It would be trained to look for people to learn what led to success, what kind of applications historically led to success by that definition. Now think about what would happen if we applied that to a current pool of applicants. It would filter out women because they do not look like people who were successful in the past.

Algorithms don’t make things fair if you just blithely, blindly apply algorithms. They repeat our past practices, our patterns. They automate the status quo. That would be great if we had a perfect world, but we don’t. And I’ll add that most companies don’t have embarrassing lawsuits, but the data scientists in those companies are told to follow the data, to focus on accuracy.Think about what that means. Because we all have bias, it means they could be codifying sexism or any other kind of bigotry.

Thought experiment, because I like them: an entirely segregated society — racially segregated, all towns, all neighborhoods and where we send the police only to the minority neighborhoods to look for crime. The arrest data would be very biased. What if, on top of that, we found the data scientists and paid the data scientists to predict where the next crime would occur? Minority neighborhood.The data scientists would brag about how great and how accurate their model would be, and they’d be right.

Now, reality isn’t that drastic, but we do have severe segregations in many cities and towns, and we have plenty of evidence of biased policing and justice system data. And we actually do predict hotspots,places where crimes will occur. And we do predict, in fact, the individual criminality, the criminality of individuals. The news organization ProPublica recently looked into one of those “recidivism risk” algorithms, as they’re called, being used in Florida during sentencing by judges. Bernard, on the left, the black man, was scored a 10 out of 10, high risk. Dylan, on the right, 3 out of 10, low risk. They were both brought in for drug possession.They both had records, but Dylan had a felony but Bernard didn’t. This matters, because the higher score you are, the more likely you’re being given a longer sentence. What’s going on? Data laundering. It’s a process by which technologists hide ugly truths inside black box algorithms and call them objective; call them meritocratic. When they’re secret, important and destructive, I’ve coined a term for these algorithms: “weapons of math destruction.”

These are private companies building private algorithms for private ends. Even the ones I talked about for teachers and the public police, those were built by private companies and sold to the government institutions. They call it their “secret sauce” — that’s why they can’t tell us about it. It’s also private power. They are profiting for wielding the authority of the inscrutable. Now you might think, since all this stuff is private and there’s competition, maybe the free market will solve this problem. It won’t. There’s a lot of money to be made in unfairness.
The good news is, we can check them for fairness. Algorithms can be interrogated, and they will tell us the truth every time. And we can fix them. We can make them better. I call this an algorithmic audit.

First, data integrity check. Second, we should think about the definition of success.Next, we have to consider accuracy. This is where the value-added model for teachers would fail immediately. No algorithm is perfect, of course, so we have to consider the errors of every algorithm. How often are there errors, and for whom does this model fail? What is the cost of that failure?

And finally, we have to consider the long-term effects of algorithms,the feedback loops that are engendering. I have two more messages, one for the data scientists out there. Data scientists: we should not be the arbiters of truth. We should be translators of ethical discussions that happen in larger society. And the rest of you, the non-data scientists: this is not a math test.This is a political fight. We need to demand accountability for our algorithmic overlords.The era of blind faith in big data must end.

Watch the original speech by Cathy O’Neil on the webiste of TED

Leave a Reply