Got Science? Podcast | Episode 54 It’s Just Code. How Can It Be Biased?

March 12, 2019

Author and professor Dr. Safiya Noble talks about racial bias in search results and discusses her book Algorithms of Oppression: How Search Engines Reinforce Racism.

In this episode Safiya talks about:

  • Why search engines aren't as objective as you might think
  • How bias can be introduced into algorithms
  • The implications and impacts of our reliance on search engines for information

Timing and cues:

  • Opener (0:00-1:02)
  • Intro (1:02-3:14)
  • Interview Part 1 (3:14-12:50)
  • Break (12:50-13:26)
  • Interview Part 2 (13:26-24:46)
  • This Week in Science History Throw (24:46-24:53)
  • This Week in Science History (24:53-27:33)
  • Outro (27:33-28:30)

Related content:

Full Transcript

Colleen: Safiya, thank you for joining me on the "Got Science? Podcast."

Safiya: Happy to be here.

Colleen: So you've written a book called "Algorithms of Oppression: How Search Engines Reinforce Racism." So how can an algorithm, a piece of computer code, reinforce racism?

Safiya: One of the things people fail to remember is that algorithms represent expressions of values. You know, math is a language. We call it a language because it's subjective. There are many ways that we can come to an answer, and there are equally many ways that we can determine the output that a search engine might generate for us. And those are often predicated upon in the kind of commercial search space, advertising quite frankly. Advertising is the primary goal, and that is what's optimized for. We want advertising to be the most visible more than maybe other forms of knowledge, or science. And so it's important that people understand that algorithms are optimized around certain values, and those values are not always apparent to us. And that's what I really try to do in this book is to help us understand in the commercial search engine space that our reliance upon these advertising-driven algorithms are really a set of values that are complicated and have a whole host of consequences for us.

Colleen: So I think most people consider Google to be the best search engine to serve up legitimate information. Why are we so trusting?

Safiya: Well, a lot of us have been on the internet since before search engines, right, and we remember what it was like going through complex decision trees, and going into chat rooms, and looking for experts to help us answer certain kinds of questions. people also remember the early days before libraries had digital databases, and we went through library card catalogs, to find knowledge, or information. And so those were time-consuming intensive kinds of other search kind of strategies that we used. And then, Google and, other search engines that even predated, I remember, you know, Lycos and AltaVista, and even Yahoo for a long time played a really important role in our lives. To some degree they also, you know, used this like decision tree type of expertise that was part of their platform. But Google came along with a different kind of model which was this kind of blank white screen with a simple box in the middle, and, you know, resocialized us to think that finding information was just clean and easy, You could just trust that this kind of AI or algorithm would do all the thinking for you, or do all the labor for you. And we've now got a whole generation of people who are quite acculturated to the idea that knowledge can be accessed in 0.03 seconds and that a Google search engine is gonna vet it, and find the most credible thing for you, when that may or may not, in fact, be the truth.

Colleen: Talk to me a little bit about technological redlining. You use that term in the book. Can you explain it?

Safiya: Sure. So we previously have had other forms of redlining. And redlining is really the process of kind of using a formula, or a set of parameters to legally discriminate, quite frankly, against people in our society. So, for example, insurance industries for a long time used zip codes as part of the formula for determining who would get a mortgage, or how lending would happen, And it would always kind of coincide that zip codes where there was racial residential segregation meant that if you were a person of color, and you lived in that zip code you probably were not going to get a loan. what we have now is that process almost like on steroids where in the digital space a whole lot of information might be collected about you, far beyond your zip code. But maybe demographic information about you, maybe your history of search, things that you've posted on the internet. All kinds of disparate bits of information can be collected, and your data profile now is something that you may not even know what it is. You certainly can't see it, but decisions about your ability to have access to kind of goods and services, or resources in our society are increasingly dependent upon that data profile. And that's what I think of as kind of this digital, or technological type of redlining which is foreclosing opportunities to people based on these data profiles.

Colleen: How do search engines work? I mean, I know the Google algorithm is top secret and, but we do have some information about how it works because we can optimize our web content.

Safiya: Well, what we can kind of discern is that there's a major driver in the kind of content that we see especially on the first page, which I have to say, I'm mostly interested in my work and what shows up on the first page because the majority of people who use commercial search engines don't go past the first page. So that's the most important real estate so to speak in search. What we know is that, for example, Google has a program called AdWords where people can participate in a live auction 24/7 where they can optimize their content, or certain keywords and pay a premium to have their content live in association with that...with those keywords., when we think of this in kind of more benign ways, you know, J.C. Penney, for example, was busted a few years ago because they had optimized the word dresses. And any time you did a search on the word "dresses," you always got J.C. Penney as, you know, dot-com as the first hit. So they were willing to pay, right, to make sure that they were always visible in relationship to those...to that content, and those keywords. And one of the things that I try to show in the book is that certain people, communities, and ideas often get optimized by big industries that have a lot of resources far more than the communities that are being misrepresented by those industries.

For example, for many years if you did searches on the word...the keywords "black girls" or "Latina girls" or "Asian girls," I mean, these are, you know, people in our society who are generally vulnerable. They're children or teenagers, they don't have a lot of money. Well, those identities became synonymous for many years with porn, and the porn industry because as you know the porn industry has more money than just about any other industry. So this is kind of this phenomenon. I mean, you didn't have to add the words "porn" or "sex." Those identities just became synonymous because the industry was able to optimize around those keywords, and those are the kinds of things. There are many, many, many examples that I give in the book to kind of demonstrate how optimization, and co-optation of certain ideas, and certain identities has, you know, I think a really negative broader impact on those people and certainly on our broad understandings of who those people might be. And, of course, this is not disconnected from other forms of media where girls of color are hypersexualized, and the kind of racist and sexist narratives and tropes around that.

Colleen: And I think you mentioned that somehow, pressure was put on Google, and things changed a little bit.

Safiya: Yes. For sure, Google has responded to the criticisms. In about the fall of 2012 was when I first noticed that they had changed the algorithm after years of having the sexually explicit content. And one of the things that I think this tells us is that we know that large visual media platforms are responsive, and they do quietly tweak their algorithms, and they do make changes. Certainly, Black Girls Code comes up now first and has for a couple of years, and that is, of course, in no small part related to the fact that Google has made a tremendous financial investment in Black Girls Code, all the way to the fact of moving them into their, you know, offices in New York, right? And so this is where other scholars like people like Helen Nissenbaum and Lucas Introna have written about kind of Google's bias towards its own properties, or its own investments.

And we know, for example, that, you know, it's always likely to serve up its own properties, or its own investments. If you're looking for a video, you're going to always get YouTube before you get Vimeo, or some other competitor, right? So that effect I think is what's happened with respect to black girls as they certainly responded to the criticism, which is positive. And, you know, we need to do a lot more around Latina and Asian girls, but also there have been certain kinds of other pressures certainly in the case of the way that certain kinds of racist red herrings have been served up into the search results, whether it's antisemitism that gets served up when you're looking for...on the keywords "jews." And they've, you know, Google has faced a lot of criticism and now for many years it's been putting up a disclaimer recognizing that white nationalists have also played a significant role in propagating disinformation, anti-semitism, and hate speech in their platform too.

[Break]

Colleen: So I hear this question a lot, and I'm sure it came up in your research. How do you respond when somebody says, "It's not the algorithm. It comes up because it's what people are searching on. It's the most searched term."?

Safiya: That really is the way the majority of people in the public I think that I encounter around these conversations think. They think that that search is just a direct mapping of what the majority think and, again, I think this is because they don't think of commercial search engines as advertising platforms. They think of them as information retrieval platforms or as fact checkers, and they're not that in a lot of ways although they also can be used for that depending on the kinds of facts you wanna check. For example, if you want to know where the closest coffee shop is, you're probably going to get a list of the closest Starbucks because Starbucks is a large corporate advertiser in Google search. You might not get the mom and pop shop that's just, you know, around the corner from it because they might not be a large advertiser, right? So, yes, Starbucks is popular, but there are also other implications about what does it mean to only serve up, and make visible the most well-heeled, or the most moneyed companies and ideas. And this is where, again, when you talk about ideas, and people, and communities who are in the minority, or who are not well capitalized, those voices, those ideas, those forms of knowledge often are obscured. Maybe they're on page, you know, 10,530. No one is going to that page.

Colleen: Even if they're on page three, you're probably were not gonna go there.

Safiya: We're gonna miss it. Right. So I think this is, again, part of the challenge is this misnomer about what these search engines are, how they work, and what the implications of them will be. And I'm particularly concerned about that in the case of like disinformation, or knowledge, or other kinds of evidence that we, you know, that scholars and scientists are really interested in because we study society and all kinds of natural phenomena as a way to be of benefit to all of us, to the public.

Colleen: How did you first get interested in this field?

Safiya: Well, you know, I spent my first career in advertising and marketing for big Fortune 500 companies.

Colleen: So you know what's...what this is about.

Safiya: I know what's going on. I mean, I have lived experience and expertise or professional expertise in influencing consumer behavior, and making certain kinds of content visible, specifically advertising, and public relations messaging. So that was my first career for many years. And then I went to graduate school. I, you know, I joke with my students, you know, to atone for my sins of being in advertising, let's say. But when I got to graduate school, I went to the information school at the library. At the time it was called The Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. And this was during the height of the Google Book digitization project where everyone was excited about Google somehow making all the world's knowledge free and accessible by digitizing all of it which, of course, many academic libraries, stepped back from that project, and, of course, many copyright lawsuits came forward. But I was so surprised that so many people were talking about Google like it was the new public library. And I had just left the advertising industry where we were trying to figure out how to game Google to make our clients more visible. So this was just, you know, on the face of it a contradiction that I didn't understand and I felt like I needed to have a way to talk about. And it was there, you know, through kind of a series of conversations, and my own interests in seeing how people were represented that the example of the pornification of girls of color became kind of like a symbolic of, a way of talking about this phenomenon of money, and influence over our information landscape. And that's really kind of how I came to this work.

Colleen: And it's happening more and more. We're not going to stop digitizing things, libraries. I hope public libraries survive but...

Safiya: Me too.

Colleen: Who knows? But it's that sense of there's no one at the front desk when you go to Google,. I'm sure a lot of people can, discern what's there, and maybe pick up some of the information that's not, I guess, it's not vetted. No one is...

Safiya: Well, no one is doing that kind of curatorial work that maybe a reference librarian or a cataloger is doing in a library, certainly. You know, one of the things that I often like to do with my students is I, because some of them have not grown up in public libraries like I grew up. I mean, I'm from the '70s so that's what we did is we lived in the library in the weekend...on the weekends and after school. And so a lot of students are not socialized to the value of public libraries in the way that other generations have been. Many of my students say that they could never have gotten through college without using Google to help them write every single one of their papers, or to figure anything out. And so I'll often have...give them an assignment where they go to the library, where they've done searches online and see, you know, to see what they might get on something they care about, a subject they care about. And then I send them to the library, and I don't just have them stop at the library database and going through kind of doing those searches for that information there which, of course, opens up a whole different world of scholarly information and science. It's really valuable in a completely different orientation mostly fact-based. And then I actually have them go to the stacks, some of whom don't even know what the stacks are until they get to these, you know, huge rows of books.

And I say, "Okay. Look for that book but look for everything around it. Where is it in the library? What are the books that are around it? What does that tell you about the context of that specific book that you might be going to look for?" And all of a sudden they realize really new and interesting things. Like, you know, "It was really interesting to me when I was searching for something about LGBTQ communities and I saw a book that looked interesting to me in the library database. And then I went to the stacks, and I saw that all of these books that I was interested in were being kind of classified, or cataloged around sexual deviance, and other kinds of criminality and that was really concerning to me." And it also helps them understand how knowledge is made, and how knowledge is organized, and who has power to define the landscape of legitimacy. And these are things scholars are also still trying to work on, and work through. So they get a bigger sense of kind of what's at stake around ideas, and information. And to me, the library is such a powerful and important cultural institution. They're our national treasures in this country, and around the world, and we need to protect them because they help us think differently about information versus knowledge.

Colleen: I love that assignment. I hope you keep doing that assignment forever. That's great.

Safiya: Thanks.

Colleen: So how would you go about creating a better sort of more credible search algorithm? Are there things that can be done?

Safiya: I am not really interested in a technical solution. I don't think... I mean, people will always make, you know, something digital, and technical, and algorithmic that they think is slightly better than the last thing that we had. Some of these questions about disinformation, knowledge, public education, the accessibility of that, and its important, its critically an important role in democracy, our values questions that get exercised at the polls in mid-term and, you know, kind of, you know, presidential electoral years. These are values that we're gonna have to make some hard decisions around in terms of what does it mean to have a well-educated public? What happens when we have an increasingly less literate public, right, or illiterate public that in my opinion is wholly reliant upon a search engine instead of reading books, and writing long-form analysis and being able to read investigative news, and do the sense-making of that. And, you know, one of the things that we know is I, you know, I work in a university which to me is the antithesis of a search engine, because in a university we have many disciplines. We might look at let's say this microphone here that's between us. You know, if you looked at this microphone from an engineering point of view, you might be thinking about the components, and all of the ways that it needs to be optimized for the best quality sound. If you look at the same object in art and design, you may say, "Wow, you know, the base of this isn't very strong. You know, it could really tip over pretty easily." Maybe we would think about this microphone in different terms. If you're in my field, you might think about, "Well what's gonna happen to this microphone, these electronics when we're done with them, and they get loaded up on a barge, and they become e-waste, and they make a toxic, cancerous city of electronic waste and trash? What's the lifespan of this? And, you know, is it being planned for obsolescence already, or is it gonna last us 100 years?" Those are the kinds of things, right?

Colleen: And maybe what is the person who's going to do the interview be talking about and thinking about.

Safiya: That's right. "How are you gonna represent me and my ideas as I speak into this microphone, right?" So the university or schools education is about many points of view that are sometimes are contesting, and are, right, pushed up against each other that give us a fuller picture, and way of thinking about something. A search engine is the antithesis of that. It actually gives us a ranking. We think it's vetted. We think the first thing that we get is the most valid, and the most credible which it likely probably isn't. And it doesn't lead us to more complex robust ways of thinking. And to me, you know, let's not do away with the universities. You know, I tell you we're in a moment where we see a massive defunding of public education, massive defunding of higher education, and a lot of excitement about the internet. And I think we need to put those things in tension with each other, and say what's lost when we don't have time, and make time for deeper inquiries.

Colleen: Safiya, thank you so much for joining me.

Safiya: Thank you so much for the invitation. I really appreciate it.

Return to the top >

Credits: 

This Week in Science History: Katy Love
Editing: Omari Spears
Music: Brian Middleton
Research and writing: Pamela Worth
Executive producer: Rich Hayes
Host: Colleen MacDonald