ABOUT THE SPEAKER
Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias.

Why you should listen

In 2008, as a hedge-fund quant, mathematician Cathy O’Neil saw firsthand how really really bad math could lead to financial disaster. Disillusioned, O’Neil became a data scientist and eventually joined Occupy Wall Street’s Alternative Banking Group.

With her popular blog mathbabe.org, O’Neil emerged as an investigative journalist. Her acclaimed book Weapons of Math Destruction details how opaque, black-box algorithms rely on biased historical data to do everything from sentence defendants to hire workers. In 2017, O’Neil founded consulting firm ORCAA to audit algorithms for racial, gender and economic inequality.

More profile about the speaker
Cathy O'Neil | Speaker | TED.com
TED2017

Cathy O'Neil: The era of blind faith in big data must end

Filmed:
1,391,460 views

Algorithms decide who gets a loan, who gets a job interview, who gets insurance and much more -- but they don't automatically make things fair. Mathematician and data scientist Cathy O'Neil coined a term for algorithms that are secret, important and harmful: "weapons of math destruction." Learn more about the hidden agendas behind the formulas.
- Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias. Full bio

Double-click the English transcript below to play the video.

00:12
Algorithms are everywhere.
0
975
1596
00:16
They sort and separate
the winners from the losers.
1
4111
3125
00:20
The winners get the job
2
8019
2264
00:22
or a good credit card offer.
3
10307
1743
00:24
The losers don't even get an interview
4
12074
2651
00:27
or they pay more for insurance.
5
15590
1777
00:30
We're being scored with secret formulas
that we don't understand
6
18197
3549
00:34
that often don't have systems of appeal.
7
22675
3217
00:39
That begs the question:
8
27240
1296
00:40
What if the algorithms are wrong?
9
28560
2913
00:45
To build an algorithm you need two things:
10
33100
2040
00:47
you need data, what happened in the past,
11
35164
1981
00:49
and a definition of success,
12
37169
1561
00:50
the thing you're looking for
and often hoping for.
13
38754
2457
00:53
You train an algorithm
by looking, figuring out.
14
41235
5037
00:58
The algorithm figures out
what is associated with success.
15
46296
3419
01:01
What situation leads to success?
16
49739
2463
01:04
Actually, everyone uses algorithms.
17
52881
1762
01:06
They just don't formalize them
in written code.
18
54667
2718
01:09
Let me give you an example.
19
57409
1348
01:10
I use an algorithm every day
to make a meal for my family.
20
58781
3316
01:14
The data I use
21
62121
1476
01:16
is the ingredients in my kitchen,
22
64394
1659
01:18
the time I have,
23
66077
1527
01:19
the ambition I have,
24
67628
1233
01:20
and I curate that data.
25
68885
1709
01:22
I don't count those little packages
of ramen noodles as food.
26
70618
4251
01:26
(Laughter)
27
74893
1869
01:28
My definition of success is:
28
76786
1845
01:30
a meal is successful
if my kids eat vegetables.
29
78655
2659
01:34
It's very different
from if my youngest son were in charge.
30
82181
2854
01:37
He'd say success is if
he gets to eat lots of Nutella.
31
85059
2788
01:41
But I get to choose success.
32
89179
2226
01:43
I am in charge. My opinion matters.
33
91429
2707
01:46
That's the first rule of algorithms.
34
94160
2675
01:48
Algorithms are opinions embedded in code.
35
96859
3180
01:53
It's really different from what you think
most people think of algorithms.
36
101562
3663
01:57
They think algorithms are objective
and true and scientific.
37
105249
4504
02:02
That's a marketing trick.
38
110387
1699
02:05
It's also a marketing trick
39
113269
2125
02:07
to intimidate you with algorithms,
40
115418
3154
02:10
to make you trust and fear algorithms
41
118596
3661
02:14
because you trust and fear mathematics.
42
122281
2018
02:17
A lot can go wrong when we put
blind faith in big data.
43
125567
4830
02:23
This is Kiri Soares.
She's a high school principal in Brooklyn.
44
131684
3373
02:27
In 2011, she told me
her teachers were being scored
45
135081
2586
02:29
with a complex, secret algorithm
46
137691
2727
02:32
called the "value-added model."
47
140442
1489
02:34
I told her, "Well, figure out
what the formula is, show it to me.
48
142505
3092
02:37
I'm going to explain it to you."
49
145621
1541
02:39
She said, "Well, I tried
to get the formula,
50
147186
2141
02:41
but my Department of Education contact
told me it was math
51
149351
2772
02:44
and I wouldn't understand it."
52
152147
1546
02:47
It gets worse.
53
155266
1338
02:48
The New York Post filed
a Freedom of Information Act request,
54
156628
3530
02:52
got all the teachers' names
and all their scores
55
160182
2959
02:55
and they published them
as an act of teacher-shaming.
56
163165
2782
02:59
When I tried to get the formulas,
the source code, through the same means,
57
167084
3860
03:02
I was told I couldn't.
58
170968
2149
03:05
I was denied.
59
173141
1236
03:06
I later found out
60
174401
1174
03:07
that nobody in New York City
had access to that formula.
61
175599
2866
03:10
No one understood it.
62
178489
1305
03:13
Then someone really smart
got involved, Gary Rubenstein.
63
181929
3224
03:17
He found 665 teachers
from that New York Post data
64
185177
3621
03:20
that actually had two scores.
65
188822
1866
03:22
That could happen if they were teaching
66
190712
1881
03:24
seventh grade math and eighth grade math.
67
192617
2439
03:27
He decided to plot them.
68
195080
1538
03:28
Each dot represents a teacher.
69
196642
1993
03:31
(Laughter)
70
199104
2379
03:33
What is that?
71
201507
1521
03:35
(Laughter)
72
203052
1277
03:36
That should never have been used
for individual assessment.
73
204353
3446
03:39
It's almost a random number generator.
74
207823
1926
03:41
(Applause)
75
209773
2946
03:44
But it was.
76
212743
1162
03:45
This is Sarah Wysocki.
77
213929
1176
03:47
She got fired, along
with 205 other teachers,
78
215129
2175
03:49
from the Washington, DC school district,
79
217328
2662
03:52
even though she had great
recommendations from her principal
80
220014
2909
03:54
and the parents of her kids.
81
222947
1428
03:57
I know what a lot
of you guys are thinking,
82
225390
2032
03:59
especially the data scientists,
the AI experts here.
83
227446
2487
04:01
You're thinking, "Well, I would never make
an algorithm that inconsistent."
84
229957
4226
04:06
But algorithms can go wrong,
85
234853
1683
04:08
even have deeply destructive effects
with good intentions.
86
236560
4598
04:14
And whereas an airplane
that's designed badly
87
242531
2379
04:16
crashes to the earth and everyone sees it,
88
244934
2001
04:18
an algorithm designed badly
89
246959
1850
04:22
can go on for a long time,
silently wreaking havoc.
90
250245
3865
04:27
This is Roger Ailes.
91
255748
1570
04:29
(Laughter)
92
257342
2000
04:32
He founded Fox News in 1996.
93
260524
2388
04:35
More than 20 women complained
about sexual harassment.
94
263436
2581
04:38
They said they weren't allowed
to succeed at Fox News.
95
266041
3235
04:41
He was ousted last year,
but we've seen recently
96
269300
2520
04:43
that the problems have persisted.
97
271844
2670
04:47
That begs the question:
98
275654
1400
04:49
What should Fox News do
to turn over another leaf?
99
277078
2884
04:53
Well, what if they replaced
their hiring process
100
281245
3041
04:56
with a machine-learning algorithm?
101
284310
1654
04:57
That sounds good, right?
102
285988
1595
04:59
Think about it.
103
287607
1300
05:00
The data, what would the data be?
104
288931
2105
05:03
A reasonable choice would be the last
21 years of applications to Fox News.
105
291060
4947
05:08
Reasonable.
106
296031
1502
05:09
What about the definition of success?
107
297557
1938
05:11
Reasonable choice would be,
108
299921
1324
05:13
well, who is successful at Fox News?
109
301269
1778
05:15
I guess someone who, say,
stayed there for four years
110
303071
3580
05:18
and was promoted at least once.
111
306675
1654
05:20
Sounds reasonable.
112
308816
1561
05:22
And then the algorithm would be trained.
113
310401
2354
05:24
It would be trained to look for people
to learn what led to success,
114
312779
3877
05:29
what kind of applications
historically led to success
115
317219
4318
05:33
by that definition.
116
321561
1294
05:36
Now think about what would happen
117
324200
1775
05:37
if we applied that
to a current pool of applicants.
118
325999
2555
05:41
It would filter out women
119
329119
1629
05:43
because they do not look like people
who were successful in the past.
120
331663
3930
05:51
Algorithms don't make things fair
121
339752
2537
05:54
if you just blithely,
blindly apply algorithms.
122
342313
2694
05:57
They don't make things fair.
123
345031
1482
05:58
They repeat our past practices,
124
346537
2128
06:00
our patterns.
125
348689
1183
06:01
They automate the status quo.
126
349896
1939
06:04
That would be great
if we had a perfect world,
127
352718
2389
06:07
but we don't.
128
355905
1312
06:09
And I'll add that most companies
don't have embarrassing lawsuits,
129
357241
4102
06:14
but the data scientists in those companies
130
362446
2588
06:17
are told to follow the data,
131
365058
2189
06:19
to focus on accuracy.
132
367271
2143
06:22
Think about what that means.
133
370273
1381
06:23
Because we all have bias,
it means they could be codifying sexism
134
371678
4027
06:27
or any other kind of bigotry.
135
375729
1836
06:31
Thought experiment,
136
379488
1421
06:32
because I like them:
137
380933
1509
06:35
an entirely segregated society --
138
383574
2975
06:40
racially segregated, all towns,
all neighborhoods
139
388247
3328
06:43
and where we send the police
only to the minority neighborhoods
140
391599
3037
06:46
to look for crime.
141
394660
1193
06:48
The arrest data would be very biased.
142
396451
2219
06:51
What if, on top of that,
we found the data scientists
143
399851
2575
06:54
and paid the data scientists to predict
where the next crime would occur?
144
402450
4161
06:59
Minority neighborhood.
145
407275
1487
07:01
Or to predict who the next
criminal would be?
146
409285
3125
07:04
A minority.
147
412888
1395
07:07
The data scientists would brag
about how great and how accurate
148
415949
3541
07:11
their model would be,
149
419514
1297
07:12
and they'd be right.
150
420835
1299
07:15
Now, reality isn't that drastic,
but we do have severe segregations
151
423951
4615
07:20
in many cities and towns,
152
428590
1287
07:21
and we have plenty of evidence
153
429901
1893
07:23
of biased policing
and justice system data.
154
431818
2688
07:27
And we actually do predict hotspots,
155
435632
2815
07:30
places where crimes will occur.
156
438471
1530
07:32
And we do predict, in fact,
the individual criminality,
157
440401
3866
07:36
the criminality of individuals.
158
444291
1770
07:38
The news organization ProPublica
recently looked into
159
446972
3963
07:42
one of those "recidivism risk" algorithms,
160
450959
2024
07:45
as they're called,
161
453007
1163
07:46
being used in Florida
during sentencing by judges.
162
454194
3194
07:50
Bernard, on the left, the black man,
was scored a 10 out of 10.
163
458411
3585
07:55
Dylan, on the right, 3 out of 10.
164
463179
2007
07:57
10 out of 10, high risk.
3 out of 10, low risk.
165
465210
2501
08:00
They were both brought in
for drug possession.
166
468598
2385
08:03
They both had records,
167
471007
1154
08:04
but Dylan had a felony
168
472185
2806
08:07
but Bernard didn't.
169
475015
1176
08:09
This matters, because
the higher score you are,
170
477818
3066
08:12
the more likely you're being given
a longer sentence.
171
480908
3473
08:18
What's going on?
172
486294
1294
08:20
Data laundering.
173
488526
1332
08:22
It's a process by which
technologists hide ugly truths
174
490930
4427
08:27
inside black box algorithms
175
495381
1821
08:29
and call them objective;
176
497226
1290
08:31
call them meritocratic.
177
499320
1568
08:35
When they're secret,
important and destructive,
178
503118
2385
08:37
I've coined a term for these algorithms:
179
505527
2487
08:40
"weapons of math destruction."
180
508038
1999
08:42
(Laughter)
181
510061
1564
08:43
(Applause)
182
511649
3054
08:46
They're everywhere,
and it's not a mistake.
183
514727
2354
08:49
These are private companies
building private algorithms
184
517695
3723
08:53
for private ends.
185
521442
1392
08:55
Even the ones I talked about
for teachers and the public police,
186
523214
3214
08:58
those were built by private companies
187
526452
1869
09:00
and sold to the government institutions.
188
528345
2231
09:02
They call it their "secret sauce" --
189
530600
1873
09:04
that's why they can't tell us about it.
190
532497
2128
09:06
It's also private power.
191
534649
2220
09:09
They are profiting for wielding
the authority of the inscrutable.
192
537924
4695
09:17
Now you might think,
since all this stuff is private
193
545114
2934
09:20
and there's competition,
194
548072
1158
09:21
maybe the free market
will solve this problem.
195
549254
2306
09:23
It won't.
196
551584
1249
09:24
There's a lot of money
to be made in unfairness.
197
552857
3120
09:29
Also, we're not economic rational agents.
198
557127
3369
09:33
We all are biased.
199
561031
1292
09:34
We're all racist and bigoted
in ways that we wish we weren't,
200
562960
3377
09:38
in ways that we don't even know.
201
566361
2019
09:41
We know this, though, in aggregate,
202
569352
3081
09:44
because sociologists
have consistently demonstrated this
203
572457
3220
09:47
with these experiments they build,
204
575701
1665
09:49
where they send a bunch
of applications to jobs out,
205
577390
2568
09:51
equally qualified but some
have white-sounding names
206
579982
2501
09:54
and some have black-sounding names,
207
582507
1706
09:56
and it's always disappointing,
the results -- always.
208
584237
2694
09:59
So we are the ones that are biased,
209
587510
1771
10:01
and we are injecting those biases
into the algorithms
210
589305
3429
10:04
by choosing what data to collect,
211
592758
1812
10:06
like I chose not to think
about ramen noodles --
212
594594
2743
10:09
I decided it was irrelevant.
213
597361
1625
10:11
But by trusting the data that's actually
picking up on past practices
214
599010
5684
10:16
and by choosing the definition of success,
215
604718
2014
10:18
how can we expect the algorithms
to emerge unscathed?
216
606756
3983
10:22
We can't. We have to check them.
217
610763
2356
10:26
We have to check them for fairness.
218
614165
1709
10:27
The good news is,
we can check them for fairness.
219
615898
2711
10:30
Algorithms can be interrogated,
220
618633
3352
10:34
and they will tell us
the truth every time.
221
622009
2034
10:36
And we can fix them.
We can make them better.
222
624067
2493
10:38
I call this an algorithmic audit,
223
626584
2375
10:40
and I'll walk you through it.
224
628983
1679
10:42
First, data integrity check.
225
630686
2196
10:46
For the recidivism risk
algorithm I talked about,
226
634132
2657
10:49
a data integrity check would mean
we'd have to come to terms with the fact
227
637582
3573
10:53
that in the US, whites and blacks
smoke pot at the same rate
228
641179
3526
10:56
but blacks are far more likely
to be arrested --
229
644729
2485
10:59
four or five times more likely,
depending on the area.
230
647238
3184
11:03
What is that bias looking like
in other crime categories,
231
651317
2826
11:06
and how do we account for it?
232
654167
1451
11:08
Second, we should think about
the definition of success,
233
656162
3039
11:11
audit that.
234
659225
1381
11:12
Remember -- with the hiring
algorithm? We talked about it.
235
660630
2752
11:15
Someone who stays for four years
and is promoted once?
236
663406
3165
11:18
Well, that is a successful employee,
237
666595
1769
11:20
but it's also an employee
that is supported by their culture.
238
668388
3079
11:24
That said, also it can be quite biased.
239
672089
1926
11:26
We need to separate those two things.
240
674039
2065
11:28
We should look to
the blind orchestra audition
241
676128
2426
11:30
as an example.
242
678578
1196
11:31
That's where the people auditioning
are behind a sheet.
243
679798
2756
11:34
What I want to think about there
244
682946
1931
11:36
is the people who are listening
have decided what's important
245
684901
3417
11:40
and they've decided what's not important,
246
688342
2029
11:42
and they're not getting
distracted by that.
247
690395
2059
11:44
When the blind orchestra
auditions started,
248
692961
2749
11:47
the number of women in orchestras
went up by a factor of five.
249
695734
3444
11:52
Next, we have to consider accuracy.
250
700253
2015
11:55
This is where the value-added model
for teachers would fail immediately.
251
703233
3734
11:59
No algorithm is perfect, of course,
252
707578
2162
12:02
so we have to consider
the errors of every algorithm.
253
710620
3605
12:06
How often are there errors,
and for whom does this model fail?
254
714836
4359
12:11
What is the cost of that failure?
255
719850
1718
12:14
And finally, we have to consider
256
722434
2207
12:17
the long-term effects of algorithms,
257
725973
2186
12:20
the feedback loops that are engendering.
258
728866
2207
12:23
That sounds abstract,
259
731586
1236
12:24
but imagine if Facebook engineers
had considered that
260
732846
2664
12:28
before they decided to show us
only things that our friends had posted.
261
736270
4855
12:33
I have two more messages,
one for the data scientists out there.
262
741761
3234
12:37
Data scientists: we should
not be the arbiters of truth.
263
745450
3409
12:41
We should be translators
of ethical discussions that happen
264
749520
3783
12:45
in larger society.
265
753327
1294
12:47
(Applause)
266
755579
2133
12:49
And the rest of you,
267
757736
1556
12:52
the non-data scientists:
268
760011
1396
12:53
this is not a math test.
269
761431
1498
12:55
This is a political fight.
270
763632
1348
12:58
We need to demand accountability
for our algorithmic overlords.
271
766587
3907
13:04
(Applause)
272
772118
1499
13:05
The era of blind faith
in big data must end.
273
773641
4225
13:09
Thank you very much.
274
777890
1167
13:11
(Applause)
275
779081
5303

▲Back to top

ABOUT THE SPEAKER
Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias.

Why you should listen

In 2008, as a hedge-fund quant, mathematician Cathy O’Neil saw firsthand how really really bad math could lead to financial disaster. Disillusioned, O’Neil became a data scientist and eventually joined Occupy Wall Street’s Alternative Banking Group.

With her popular blog mathbabe.org, O’Neil emerged as an investigative journalist. Her acclaimed book Weapons of Math Destruction details how opaque, black-box algorithms rely on biased historical data to do everything from sentence defendants to hire workers. In 2017, O’Neil founded consulting firm ORCAA to audit algorithms for racial, gender and economic inequality.

More profile about the speaker
Cathy O'Neil | Speaker | TED.com