Algorithmic bias has been a major topic over the last few years, but for many decision makers, the problem seems remote, confusing, or not a top concern. Even if you’ve been working at removing the bias from your business or technology, decades of bias in the data that got us here means we still have a ways to go, and we’re going to need everyone on board. In this episode, my guests and I explore what algorithmic bias is, where it comes from, how we may be unintentionally making the problem worse, and most importantly, what we can do about it.
Guests this week include Dr. Dorothea Baur, Ana Milicevic, Dr. Chris Gilliard, Renee Cummings, Dr. Safiya U. Noble, Abhishek Gupta, Dr. Rumman Chowdhury, Calli Schroeder, and Giselle Mota.
The Tech Humanist Show is a multi-media-format program exploring how data and technology shape the human experience. Hosted by Kate O’Neill. Produced and edited by Chloe Skye, with research by Ashley Robinson and Erin Daugherty at Interrobang.
To watch full interviews with past and future guests, or for updates on what Kate O’Neill is doing next, subscribe to The Tech Humanist Show hosted by Kate O’Neill channel on YouTube, or head to KOinsights.com.
Kate O’Neill: Perhaps it’s easy to hear “algorithmic bias” and assume we’re only talking about racism or sexism encoded into our data. “Only…” Those certainly are two very real and very relevant implications of the term, but they don’t tell the whole story. In fact—everyone who interacts with technology in any way is either affected by or directly affecting algorithmic bias on a daily basis, and most of it is invisible to us. Some people, of course, are affected more profoundly than others. Depending on your circumstances, that bias might even sometimes help you: if you happen to be in a demographic that trends toward higher credit scores, you may be more likely to get approved for a loan than someone from a different demographic with an identical credit history. But sometimes, and not all that rarely, the bias in algorithms can hurt you, and in surprising ways: the car insurance calculation algorithm, for example, is likely to quote you a higher premium if you’re a man.
The problem overall is that, algorithmically, the individual doesn’t matter—it’s all about the aggregate. Decisions are made for you based on the average of all the available data across all of your demographic attributes, often whether these attributes are relevant or not to the decision at hand. Which would be one thing if we could see and impact the way these algorithms function, but as individuals, we don’t have that ability. We have no way of knowing what decisions they’re making for us, or how. Dorothea Baur, a leading expert & advisor on ethics, responsibility, and sustainability in finance, technology, and beyond, has some strong feelings about that.
Dorothea Baur: One of the biggest achievements is two-hundred and forty years ago when the Enlightenment set in in Europe. Emmanuel Kant said, ‘hey, people! Dare to use your own mind!’ It was like a wake-up call, because up ‘til then…we didn’t really make an effort to explore the world because we thought everything was determined by God. By stepping out and using our own brains, we liberated ourselves from the shackles of religion or other authorities. And so now, are we just taking it too far? Have we used our brains so far that we are eventually creating machines that are smarter than us and kind of imposing their decisions, again, upon us,…that are equally as intransparent as God’s decisions, if you look at certain algorithms.
Kate: Even if the algorithms weren’t biased, they would still impact our lives on a daily basis. But biased they indeed are. Which means more and more, the status quo—even the ugly parts of it—is being made permanent. The bright side is that people are talking more about the risks inherent in algorithmic decision-making, which means the bias baked into our data is at least an addressable problem.
So that’s what I’m talking about today: algorithmic bias, where it comes from, how it affects you and your business, and how we can use strategic big-picture thinking to mitigate and erase the harm it causes.
Ana Milicevic: We talk about algorithms so much, but we talk about them as if they’re some magical entity that does something, and it’s not! Human programmers put in inputs and put in constraints and reflect their perspective of the world, and then machine language interprets that and channels it back to us. I just want everybody who has said something along the lines of ‘well, the algorithm does this,’ to replace that with, ‘the people who programmed this algorithm have done so and so.’ Because I think it removes responsibility from us as authors.”
Kate: That was Ana Milicevic, a longtime digital media executive who is principal and co-founder of Sparrow Digital Holdings, discussing a trend she’s seen in the way we talk about the algorithms that run our apps, technology, and AI systems. We act like they are unknowable and beyond our control, when in truth, they’re just codifications of data models.
The bias in these systems often stems from bias within the original data that was used to code them. This is what’s known as ‘data bias,’ one of the two types of bias in AI, where algorithms are trained using data that is itself already biased. This type of bias is frequently caused by the other type, ‘societal bias,’ which deals with how that data is collected, based on our assumptions and norms as a society. Faulty assumptions create blind spots or expectations that may not reflect reality, which leads to us gathering excess information that doesn’t matter (but that we believe does), or failing to collect data about things we believe we already understand.
What that means is the data isn’t always the problem—it’s the top-level business decisions made before data is collected that have a massive impact on where it comes from and how it’s used. Solving the problem means challenging assumptions, which isn’t something a person—or team of people—unaware of their biases is capable of doing. Part of the issue is that the tech industry is still struggling to become genuinely diverse in ways that would make these underlying biases less hidden. Dr. Chris Gilliard, a writer, professor and speaker whose scholarship concentrates on digital privacy and the intersections of race, class, and technology, explains what that looks like at a company like Facebook.
Dr. Chris Gilliard: The headline said something like, ‘Facebook inching towards their diversity goals.’ I don’t know the exact number, and they had gone from 4% to 5%, or something like that…They’ve offered many different excuses for why their diversity numbers are what they are, including blaming it on like a pipeline issue and things like that. These companies, on the one hand want us to believe that they can achieve the impossible through code, but on the other hand, it’s like super impossible to find black people who can work here.
Kate: Beyond race, there’s a lack of diversity in class, thought, and ability. Ana Milicevic explains further.
Ana Milicevic: Technology is still very homogenous. The people who get to build it, overwhelmingly, across the world, come from a certain background and think very similarly to one another. And unless we can figure out how to democratize this, we’re always gonna have an inherent bias reflected in the algorithms, even if the authors of software have the most noble intentions ever.
Kate: Because even “nobly-intentioned” coders have to take shortcuts to get products out on time. Rather than attempt to collect new data, it’s easier—and faster—to borrow data from old sources. If there’s no one around with a different perspective or life experience, the coders may believe they’ve removed any bias or harm from their new system—because they weren’t able to recognize the bias that was already there. They may not have thought to check whether the data was collected fairly—or even be able to consider the motivations and intentions of those who initially collected it.
Renee Cummings, AI ethicist, criminal psychologist, and Data Activist at UVA’s School of Data Science, explained one issue that keeps cropping up in the criminal justice system.
Renee Cummings: So if you’re using data that has been gathered from communities that have been over-policed, of course the data is not going to be as accurate as you want that data to be. When you look at the history in the country of enslavement—when you think about things like the slave codes, and the Fugitive Slave Act, or you think of the 13th Amendment, or the one-drop rule, or the Three-Fifths Compromise, and Jim Crow, and segregation… how do you remove that from the data? How do you take that history of systemic racism out of that data? And if that history is baked into your data sets, what are you going to produce with that?
Kate: It turns out… a lot of biased and harmful decisions.
Renee Cummings: Risk-assessment tools were being used to give a score as to whether or not someone was going to re-offend, attaching risk-assessment tools to recidivism rates. And what we saw with this criminal justice is that many of the high-impact decisions happened there—when it comes to life or liberty, when it comes to life or death. And if you’re planning to use an AI or an algorithmic decision-making tool in the criminal justice system, you cannot be using a tool that is so opaque, or a tool that’s providing predictions that are overestimating recidivism. And I felt that what I was seeing was not a conscience when it came to the use of artificial intelligence, because too many of the tools were being designed from a place of biased data.
And this is happening in every industry, anywhere processes or decisions are being digitized — which is pretty much everywhere. Once biased data has been used, the resulting output often becomes the building blocks of new code. With each generation of development, these layers upon layers of legacy bias and harm become more entrenched in the way our technology functions, and become more baked in, more permanent. We’ve been doing this so long it’s all but impossible to excavate the generations-removed biases layered underneath today’s code.
One way this is happening is in the arena of crime, where we’ve been conditioned to think of crime in a particular way for so long, our technology evolved to do the same thing.
Renee Cummings: I did a lot of work on investigating white-collar crime, and profiling white-collar criminals, and this is why I’m saying that so many of the tools that are being designed are all focused on the street. Let’s use the technology to really look at other spaces where crime is happening.
Kate: The social bias that ‘street-crime’ is the only crime that needs policing is centuries old, but that assumption was rarely questioned, and now we have algorithms deciding where to deploy police officers. Lo and behold, they’re being dispatched to low-income, minority neighborhoods, because that’s where older crime data tended to come from, and where it tended to focus. And because people who aren’t living in those neighborhoods aren’t seeing the problems that ensue, they have a hard time believing or even imagining the consequences. Here’s Dr. Gilliard again to elaborate.
Dr. Chris Gilliard: A big part of the problem is that a lot of these effects are invisible, they disproportionately or disparately impact marginalized populations. So, I’ve talked, spoken, and written about Amazon Ring a lot. Part of the problem with talking about it is that most people who would invest in a Ring only think of law enforcement in terms of a institution or body that works for them, never as one that’s going to work against them. So it’s very hard; they often place the safety of their packages as more important than the safety of black lives. Even if you explicitly ask them, they would still say, ‘well, I need to get, y’know, whatever product it is.’ And so until that changes, some of these probably aren’t going to change.
Kate: Basically, we have a lot of work to do. Part of that work might start with reframing how we talk about these issues. Dr. Safiya Noble, Professor of Gender and African American Studies at UCLA and author of Algorithms of Oppression: How Search Engines Reinforce Racism, asserts that using the word ‘bias’ exacerbates the problem by removing responsibility.
Safiya U. Noble: The real critique of the implications of the work is getting defamed and depoliticized. How you see that now is instead of talking about algorithmic discrimination or oppression, which are kinds of words I use, people are talking about things like ‘bias.’ One of the things that does is it neutralizes the power of the critique by devolving it into a set of arguments that, y’know, everybody is biased, everything is biased, and that’s not helpful! These things are structural, they’re not just living at the individual level of how a coder or how a programming team thinks. All forms of classification and categorization have implications. Most of the ways the technology is oriented is around, like, binary classification systems, and that already is a problem. It certainly is a problem around gender, it’s a problem around race… And you’re also creating social structure through those categories. And what we know is that those categories have always existed, at least in a Western context, as hierarchical. So if you have a racial classification system like we have in the United States, where White is the highest valued, most resourced, and most powerful, and Black is the antithesis of that, and everything in between in vying for its relationship to power or powerlessness, those systems become real! So the question is, how do we create systems that are not hierarchical, and where power is not distributed along those lines of classification or categorization? Instead, we are reinforcing those systems of power over and over and over again.
Kate: Regardless of whether the intent is to harm, the fact is that these algorithms do cause it. Too often, decisions around AI, tech, and algorithms are based on cost-efficiency and timeliness of product release, or worse, a vacuum of ignorance, which is the case when the top-level decision-makers hear about the importance of a new technology and decide to integrate it into their product launch without any practical purpose, prioritizing ‘popularity’ over practicality. In either case, so long as there is inequality, algorithmic bias and its oppressive consequences are inevitable, and the technical debt increases yet again.
But bias and oppression aren’t only coded into our lives by tech developers and coders. As individuals, we encode bias into technology as well. When it comes to search engines, tons of data goes into aggregating the best results for a search query, using information on which links past users clicked upon seeing similar results.
Dr. Noble helps teach her students how this information is biased by having them search the Internet for an identity of their choosing. The results help them realize that algorithmic bias affects everyone—which was made all too clear at the University of Illinois…
Safiya U. Noble: I had a disproportionately high number of white women in predominantly white sororities. I think almost all of them had done a search on ‘sorority girl,’ and were pissed.
Kate: Because what turned up?
Safiya U. Noble: Porn.
Kate: Because porn was most commonly associated algorithmically with their identity. But those biases didn’t suddenly start their existence with the advent of the Internet. These are cultural biases, which Dr. Noble reinforces another way.
Safiya U. Noble: I don’t think, like, searches on black girls, and latina girls and asian girls that would surface porn meant that much to them, when they looked for their own identity, they were disgusted. I also have them go look for the same people and communities in the library. This is a place where they see the subjectivity of knowledge. Where they start to realize, y’know, ‘I was looking up my sexuality, and it was in a cluster of books about sexual deviance, and I’m not feeling that.’ And so then they start to see that knowledge is subjective, and it’s political…Those experiences help students feel the impact of the work that you’re talking about.
Kate: Even if you’re aware of these cultural biases and make efforts to control for them, there may be more bias in your data than you think. Some people input bias into new systems intentionally, through a process known as data poisoning. Abhishek Gupta, machine learning engineer, founder of the Montreal AI Ethics Institute, and board member of Microsoft’s CSE Responsible AI board warned about data poisoning when we spoke.
Abhishek Gupta: If you in all earnestness put bias mitigation efforts in place as you were developing an ML system, if I was to compromise the model through an attack like data poisoning, I could negate the effect of that bias mitigation measure, and basically render useless that effort that you put in, and create this situation where I’m slowly able to compromise all of these different aspects of the ethical considerations that you put in place for that system.
Kate: Which is how a well-known example like Microsoft’s Tay twitter account happens. Tay was an artificially intelligent social chatbot designed to interact on Twitter in 2016 and learn from its interactions with the denizens of that platform. Within very short order, it had become corrupted with racist, Nazi, anti-Semitic hate speech.
But this phenomenon goes well beyond Tay or chatbots. Any algorithmic system can become corrupted through abuse, misuse, and the limits of inadequate guardrails. Dr. Chris Gilliard explained how platforms like Facebook’s advertising model make that easy to do.
Dr. Chris Gilliard: Pro-Publica essentially found that if you wanted to, that it was possible to target a housing ad and have certain groups not see that ad. One of the most pernicious parts of this is that, people don’t know what they’re not seeing! Someone using Facebook wouldn’t know that they’re not seeing an ad for housing. Facebook would call it a ‘mistake,’ but I think it’s incumbent upon us to realize that these are the products of decisions that Facebook made.”
Kate: And it isn’t just Facebook’s data that can be poisoned. It’s everyone’s data. And if you use data for any part of your business, whether you generate it yourself or buy it from another company, some percentage of it may be poisoned, and there’s probably no way to figure out how much. Even if you never wrote another line of code that discriminated against your users or customers based on their demographics, it’s probably happening anyway. There’s also the question of bias showing up due to the origin of a technology.
Dr. Rumman Chowdhury: The origin of technology absolutely does matter. Things that are built for military use, even if it is moved into the commercial space—which is, by the way, a lot of technology—will still hold with it the vestiges of, let’s say, surveillance, or monitoring, because it is ultimately built assuming the world is a particular way. In other words, there are ‘good people’ and ‘bad people.’ There’s the people I’m protecting, and the people I’m fighting, ‘cause that’s just how the military is structured.
Kate: That’s Dr. Rumman Chowdhury, currently the Director of the Machine Learning Ethics, Transparency, and Accountability team at Twitter. Our tech today is coded with vestigial data collected by people with a specific understanding of the world, which leads to repeating issues that could have been prevented.
Dr. Rumman Chowdhury: Fundamentally, your view of the world will impact the technology that you build…And this has kind of been some of the critiques of the way some of these research firms have been trying to arrive at sentient AI, is by having them play games. And they have them play combative games, rather than collaborative games. And your objective function matters! If my function is to win a game where I have to kill everybody, or it’s a zero-sum world in which I have to have the most amount of points to win, that sets up a very different system than one in which I’m training it to play a game where we have to be collaborative and collectively succeed.
Kate: So let’s bring it back to you! Collective success is a worthy goal, so how do we get there? How can you make efforts to lessen or eradicate bias within your company? What choices can you make to reduce your business’s negative social and global impacts?
First, don’t collect data just for the sake of collecting data. Start by asking yourself the big picture question: what is the purpose of our business? What matters to our customers or users, and to how we operate in alignment with that? For more advice on that front, check out our recent episode “Why Human Experience?”
Once you’ve determined what matters, you can determine what data you actually need, rather than just collecting and storing data for the sake of it. Calli Schroeder, an attorney specializing in privacy, security, and technology law and member of the Global Privacy Council at EPIC Privacy, elaborates.
Calli Schroeder: People collect so much just because they can. You should evaluate with this costs you in effort, versus what you’re getting from it. ‘Collect it just so you have it just in case’—that’s such an irresponsible perspective. The more information you have, the more risk you have of a breach, and the more liability you have. If you have a data breach and all you’ve collected about your clients is name and mailing address for the services, that’s hugely different than if you’ve also collected race, and gender, and income, and financial account information, and sexual orientation, and—why do you need that? Why do you want that?
Kate: The best way not to discriminate against your employees, customers, or users on the basis of their identifying characteristics? Don’t have that information about them. However, there are industries where that information can be valuable—namely, healthcare. Here’s Dorothea Baur again.
Dorothea Baur: So, you talk about precision medicine, you get the right treatment based on all your specific characteristics—with the help of AI, patterns can be evaluated better. So there it’s like a reverting of the ethical problem, it’s like, ‘please discriminate me!’ I want you to take into account all my characteristics. I want you to know how many veggie burgers I eat, I want you to know how much I sleep, whatever I do, all my risk factors, etc., because if this helps me to get the right treatment, I’m willing to lay it bare.
Kate: Beyond medicine, there’s the environment.
Dorothea Baur: The whole problem of pattern recognition in machine learning, where if it’s applied to humans, it is full of biases, and it kind of confuses correlation and causation, and it’s violating privacy, etc. There are a lot of issues that you don’t have when you use the same kind of technology in a natural science context, you know? Where you just observe patterns of oceans and clouds and whatever, or when you try to control the extinction of species. I mean, animals don’t have a need for or a right to privacy, so why not use AI in contexts where it doesn’t violate anyone’s moral rights?
Kate: If you’re in an industry where you can provide better service by collecting additional information, there are still actions you can take to reduce bias. Most notably, hire employees and bring on experts and consultants that are every bit as diverse as and reflective of the stakeholders and communities you are serving, and listen when they explain any ramifications or consequences affecting their communities that you may have overlooked. Giselle Mota, Principal Consultant on the Future of Work at ADP, joined me to discuss exactly that.
Giselle Mota: Diverse minds are needed at the table. So we need designers, policymakers, data scientists, that don’t only look the same, and are not the traditional people who’ve always been building the same stuff—we need to address the data itself. You have to be accountable to what it is that you’re doing and how you’re using people’s data, what data’s going into how you’re training your algorithms, from when you’re allowing that algorithm to determine candidate relevancy, if you’re gonna hire someone, if you’re gonna determine somebody’s credit, or image classification, standards that we apply on people… like, you have to mix up that data with a mix of representation.
Kate: Renee Cummings echoed that sentiment, reiterating that simply talking about mitigating bias isn’t enough, because again—we don’t know what our biases are. We can’t know what we don’t know.
Renee Cummings: I think there are many companies right now, particularly at this moment, where systemic racism seems to be on the front burner—there are many designers who are presenting tools that are detecting and monitoring and managing risks, and they have bias and discrimination in there, and racism—but it’s bigger than that. You’ve got to go back to the subconscious. What we really need are not only data scientists designing tools for criminal justice, but criminologists, and criminalists, and criminal psychologists, and other individuals involved in the criminal justice system working alongside those data scientists. Why use new technology to amplify historic biases that have kept us trapped? It’s going to take a combination of thinking, a new type of consciousness that we’ve got to develop, vigilance—because you’ve got to be constantly aware, and we all have biases, we all have prejudices—but it also calls for diversity, and equity, and inclusion, as a risk-management strategy. Diversity in itself just doesn’t create an AI system that’s perfect, but at least we will know that certain checks and balances have been applied.
Kate: There’s also no shame in just calling for a do-over. Certain technologies have so much bias embedded within them, and allow further bias to cause harm to people and communities in the real world, that it might make sense to just… get rid of them and try again. Dr. Gilliard thinks that’s what we should do with Facial Recognition, where bias has been responsible for immense harm in the lives of many, and especially people of color.
Dr. Chris Gilliard: Many people for years have been saying that Facial Recognition should either not exist, or should be heavily regulated. And those people have been told, ‘the horse is already out of the barn,’ like, once a tech is out in society, like, you can’t take it back. But we’re seeing different cities, townships, municipalities ban facial recognition. We’re seeing companies step back in one form or another. We’re seeing cracks, and having some victories. As a society, we’re not just stuck, like, once some clown puts something out there. One of the things I’ve seen that does give me a little bit of hope is there are more and more people not only saying that we have to do that work, but being inside these companies and holding them accountable for doing it. Not after the harm is done, right, not after the toxin’s been released, but before it’s released.
Kate: And that’s what we need to do. Wherever possible, decisions need to be made to mitigate bias before products are released. In the short term, it may seem more cost effective or efficient to get your product out as soon as possible, but if it’s released and ends up causing more harm than it’s worth, it might also cost you more money in the long-run. It’s a lot harder to recall a product or have make changes to an operational system than it is to do that work ahead of time.
So whether you’re a CEO or you have the ear of one, know that there are things you don’t know. Bring in intelligent people to collaborate and elevate your product or service. Be aware of your own biases—at least as aware as you can be—and be open to hearing from people about things you might have missed. Be aware of data poisoning methods and make efforts to counteract or prevent them. Decide on a big-picture level what data is essential to your business, and don’t collect anything beyond that. This will make you less likely to suffer a data breach and save you money in both data storage and processing power.
The more problems you catch ahead of time, the less bias will be present in your final product, and the more future-proofed your business becomes. An added bonus is that will likely attract more customers and users as well. If we can do all of these things, our future technologies might do better than ‘reduce harm.’ They might even move society in a more positive direction.
I’m Kate O’Neill. Thank you for joining me on this deep dive into algorithmic bias, harm, and oppression, and what we can do to change it.