ABOUT THE SPEAKER

Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias.

Why you should listen

In 2008, as a hedge-fund quant, mathematician Cathy O’Neil saw firsthand how really really bad math could lead to financial disaster. Disillusioned, O’Neil became a data scientist and eventually joined Occupy Wall Street’s Alternative Banking Group.

With her popular blog mathbabe.org, O’Neil emerged as an investigative journalist. Her acclaimed book Weapons of Math Destruction details how opaque, black-box algorithms rely on biased historical data to do everything from sentence defendants to hire workers. In 2017, O’Neil founded consulting firm ORCAA to audit algorithms for racial, gender and economic inequality.

More profile about the speaker
Cathy O'Neil | Speaker | TED.com

TED2017

Cathy O'Neil: The era of blind faith in big data must end

Filmed: 2017-04-24

Readability: 4.4

1,391,460 views

Algorithms decide who gets a loan, who gets a job interview, who gets insurance and much more -- but they don't automatically make things fair. Mathematician and data scientist Cathy O'Neil coined a term for algorithms that are secret, important and harmful: "weapons of math destruction." Learn more about the hidden agendas behind the formulas.

Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias. Full bio

Double-click the English transcript below to play the video.

00:12

Algorithms are everywhere.

0

975

1596

00:16

They sort and separate
the winners from the losers.

1

4111

3125

00:20

The winners get the job

2

8019

2264

00:22

or a good credit card offer.

3

10307

1743

00:24

The losers don't even get an interview

4

12074

2651

00:27

or they pay more for insurance.

5

15590

1777

00:30

We're being scored with secret formulas
that we don't understand

6

18197

3549

00:34

that often don't have systems of appeal.

7

22675

3217

00:39

That begs the question:

8

27240

1296

00:40

What if the algorithms are wrong?

9

28560

2913

00:45

To build an algorithm you need two things:

10

33100

2040

00:47

you need data, what happened in the past,

11

35164

1981

00:49

and a definition of success,

12

37169

1561

00:50

the thing you're looking for
and often hoping for.

13

38754

2457

00:53

You train an algorithm
by looking, figuring out.

14

41235

5037

00:58

The algorithm figures out
what is associated with success.

15

46296

3419

01:01

What situation leads to success?

16

49739

2463

01:04

Actually, everyone uses algorithms.

17

52881

1762

01:06

They just don't formalize them
in written code.

18

54667

2718

01:09

Let me give you an example.

19

57409

1348

01:10

I use an algorithm every day
to make a meal for my family.

20

58781

3316

01:14

The data I use

21

62121

1476

01:16

is the ingredients in my kitchen,

22

64394

1659

01:18

the time I have,

23

66077

1527

01:19

the ambition I have,

24

67628

1233

01:20

and I curate that data.

25

68885

1709

01:22

I don't count those little packages
of ramen noodles as food.

26

70618

4251

01:26

(Laughter)

27

74893

1869

01:28

My definition of success is:

28

76786

1845

01:30

a meal is successful
if my kids eat vegetables.

29

78655

2659

01:34

It's very different
from if my youngest son were in charge.

30

82181

2854

01:37

He'd say success is if
he gets to eat lots of Nutella.

31

85059

2788

01:41

But I get to choose success.

32

89179

2226

01:43

I am in charge. My opinion matters.

33

91429

2707

01:46

That's the first rule of algorithms.

34

94160

2675

01:48

Algorithms are opinions embedded in code.

35

96859

3180

01:53

It's really different from what you think
most people think of algorithms.

36

101562

3663

01:57

They think algorithms are objective
and true and scientific.

37

105249

4504

02:02

That's a marketing trick.

38

110387

1699

02:05

It's also a marketing trick

39

113269

2125

02:07

to intimidate you with algorithms,

40

115418

3154

02:10

to make you trust and fear algorithms

41

118596

3661

02:14

because you trust and fear mathematics.

42

122281

2018

02:17

A lot can go wrong when we put
blind faith in big data.

43

125567

4830

02:23

This is Kiri Soares.
She's a high school principal in Brooklyn.

44

131684

3373

02:27

In 2011, she told me
her teachers were being scored

45

135081

2586

02:29

with a complex, secret algorithm

46

137691

2727

02:32

called the "value-added model."

47

140442

1489

02:34

I told her, "Well, figure out
what the formula is, show it to me.

48

142505

3092

02:37

I'm going to explain it to you."

49

145621

1541

02:39

She said, "Well, I tried
to get the formula,

50

147186

2141

02:41

but my Department of Education contact
told me it was math

51

149351

2772

02:44

and I wouldn't understand it."

52

152147

1546

02:47

It gets worse.

53

155266

1338

02:48

The New York Post filed
a Freedom of Information Act request,

54

156628

3530

02:52

got all the teachers' names
and all their scores

55

160182

2959

02:55

and they published them
as an act of teacher-shaming.

56

163165

2782

02:59

When I tried to get the formulas,
the source code, through the same means,

57

167084

3860

03:02

I was told I couldn't.

58

170968

2149

03:05

I was denied.

59

173141

1236

03:06

I later found out

60

174401

1174

03:07

that nobody in New York City
had access to that formula.

61

175599

2866

03:10

No one understood it.

62

178489

1305

03:13

Then someone really smart
got involved, Gary Rubenstein.

63

181929

3224

03:17

He found 665 teachers
from that New York Post data

64

185177

3621

03:20

that actually had two scores.

65

188822

1866

03:22

That could happen if they were teaching

66

190712

1881

03:24

seventh grade math and eighth grade math.

67

192617

2439

03:27

He decided to plot them.

68

195080

1538

03:28

Each dot represents a teacher.

69

196642

1993

03:31

(Laughter)

70

199104

2379

03:33

What is that?

71

201507

1521

03:35

(Laughter)

72

203052

1277

03:36

That should never have been used
for individual assessment.

73

204353

3446

03:39

It's almost a random number generator.

74

207823

1926

03:41

(Applause)

75

209773

2946

03:44

But it was.

76

212743

1162

03:45

This is Sarah Wysocki.

77

213929

1176

03:47

She got fired, along
with 205 other teachers,

78

215129

2175

03:49

from the Washington, DC school district,

79

217328

2662

03:52

even though she had great
recommendations from her principal

80

220014

2909

03:54

and the parents of her kids.

81

222947

1428

03:57

I know what a lot
of you guys are thinking,

82

225390

2032

03:59

especially the data scientists,
the AI experts here.

83

227446

2487

04:01

You're thinking, "Well, I would never make
an algorithm that inconsistent."

84

229957

4226

04:06

But algorithms can go wrong,

85

234853

1683

04:08

even have deeply destructive effects
with good intentions.

86

236560

4598

04:14

And whereas an airplane
that's designed badly

87

242531

2379

04:16

crashes to the earth and everyone sees it,

88

244934

2001

04:18

an algorithm designed badly

89

246959

1850

04:22

can go on for a long time,
silently wreaking havoc.

90

250245

3865

04:27

This is Roger Ailes.

91

255748

1570

04:29

(Laughter)

92

257342

2000

04:32

He founded Fox News in 1996.

93

260524

2388

04:35

More than 20 women complained
about sexual harassment.

94

263436

2581

04:38

They said they weren't allowed
to succeed at Fox News.

95

266041

3235

04:41

He was ousted last year,
but we've seen recently

96

269300

2520

04:43

that the problems have persisted.

97

271844

2670

04:47

That begs the question:

98

275654

1400

04:49

What should Fox News do
to turn over another leaf?

99

277078

2884

04:53

Well, what if they replaced
their hiring process

100

281245

3041

04:56

with a machine-learning algorithm?

101

284310

1654

04:57

That sounds good, right?

102

285988

1595

04:59

Think about it.

103

287607

1300

05:00

The data, what would the data be?

104

288931

2105

05:03

A reasonable choice would be the last
21 years of applications to Fox News.

105

291060

4947

05:08

Reasonable.

106

296031

1502

05:09

What about the definition of success?

107

297557

1938

05:11

Reasonable choice would be,

108

299921

1324

05:13

well, who is successful at Fox News?

109

301269

1778

05:15

I guess someone who, say,
stayed there for four years

110

303071

3580

05:18

and was promoted at least once.

111

306675

1654

05:20

Sounds reasonable.

112

308816

1561

05:22

And then the algorithm would be trained.

113

310401

2354

05:24

It would be trained to look for people
to learn what led to success,

114

312779

3877

05:29

what kind of applications
historically led to success

115

317219

4318

05:33

by that definition.

116

321561

1294

05:36

Now think about what would happen

117

324200

1775

05:37

if we applied that
to a current pool of applicants.

118

325999

2555

05:41

It would filter out women

119

329119

1629

05:43

because they do not look like people
who were successful in the past.

120

331663

3930

05:51

Algorithms don't make things fair

121

339752

2537

05:54

if you just blithely,
blindly apply algorithms.

122

342313

2694

05:57

They don't make things fair.

123

345031

1482

05:58

They repeat our past practices,

124

346537

2128

06:00

our patterns.

125

348689

1183

06:01

They automate the status quo.

126

349896

1939

06:04

That would be great
if we had a perfect world,

127

352718

2389

06:07

but we don't.

128

355905

1312

06:09

And I'll add that most companies
don't have embarrassing lawsuits,

129

357241

4102

06:14

but the data scientists in those companies

130

362446

2588

06:17

are told to follow the data,

131

365058

2189

06:19

to focus on accuracy.

132

367271

2143

06:22

Think about what that means.

133

370273

1381

06:23

Because we all have bias,
it means they could be codifying sexism

134

371678

4027

06:27

or any other kind of bigotry.

135

375729

1836

06:31

Thought experiment,

136

379488

1421

06:32

because I like them:

137

380933

1509

06:35

an entirely segregated society --

138

383574

2975

06:40

racially segregated, all towns,
all neighborhoods

139

388247

3328

06:43

and where we send the police
only to the minority neighborhoods

140

391599

3037

06:46

to look for crime.

141

394660

1193

06:48

The arrest data would be very biased.

142

396451

2219

06:51

What if, on top of that,
we found the data scientists

143

399851

2575

06:54

and paid the data scientists to predict
where the next crime would occur?

144

402450

4161

06:59

Minority neighborhood.

145

407275

1487

07:01

Or to predict who the next
criminal would be?

146

409285

3125

07:04

A minority.

147

412888

1395

07:07

The data scientists would brag
about how great and how accurate

148

415949

3541

07:11

their model would be,

149

419514

1297

07:12

and they'd be right.

150

420835

1299

07:15

Now, reality isn't that drastic,
but we do have severe segregations

151

423951

4615

07:20

in many cities and towns,

152

428590

1287

07:21

and we have plenty of evidence

153

429901

1893

07:23

of biased policing
and justice system data.

154

431818

2688

07:27

And we actually do predict hotspots,

155

435632

2815

07:30

places where crimes will occur.

156

438471

1530

07:32

And we do predict, in fact,
the individual criminality,

157

440401

3866

07:36

the criminality of individuals.

158

444291

1770

07:38

The news organization ProPublica
recently looked into

159

446972

3963

07:42

one of those "recidivism risk" algorithms,

160

450959

2024

07:45

as they're called,

161

453007

1163

07:46

being used in Florida
during sentencing by judges.

162

454194

3194

07:50

Bernard, on the left, the black man,
was scored a 10 out of 10.

163

458411

3585

07:55

Dylan, on the right, 3 out of 10.

164

463179

2007

07:57

10 out of 10, high risk.
3 out of 10, low risk.

165

465210

2501

08:00

They were both brought in
for drug possession.

166

468598

2385

08:03

They both had records,

167

471007

1154

08:04

but Dylan had a felony

168

472185

2806

08:07

but Bernard didn't.

169

475015

1176

08:09

This matters, because
the higher score you are,

170

477818

3066

08:12

the more likely you're being given
a longer sentence.

171

480908

3473

08:18

What's going on?

172

486294

1294

08:20

Data laundering.

173

488526

1332

08:22

It's a process by which
technologists hide ugly truths

174

490930

4427

08:27

inside black box algorithms

175

495381

1821

08:29

and call them objective;

176

497226

1290

08:31

call them meritocratic.

177

499320

1568

08:35

When they're secret,
important and destructive,

178

503118

2385

08:37

I've coined a term for these algorithms:

179

505527

2487

08:40

"weapons of math destruction."

180

508038

1999

08:42

(Laughter)

181

510061

1564

08:43

(Applause)

182

511649

3054

08:46

They're everywhere,
and it's not a mistake.

183

514727

2354

08:49

These are private companies
building private algorithms

184

517695

3723

08:53

for private ends.

185

521442

1392

08:55

Even the ones I talked about
for teachers and the public police,

186

523214

3214

08:58

those were built by private companies

187

526452

1869

09:00

and sold to the government institutions.

188

528345

2231

09:02

They call it their "secret sauce" --

189

530600

1873

09:04

that's why they can't tell us about it.

190

532497

2128

09:06

It's also private power.

191

534649

2220

09:09

They are profiting for wielding
the authority of the inscrutable.

192

537924

4695

09:17

Now you might think,
since all this stuff is private

193

545114

2934

09:20

and there's competition,

194

548072

1158

09:21

maybe the free market
will solve this problem.

195

549254

2306

09:23

It won't.

196

551584

1249

09:24

There's a lot of money
to be made in unfairness.

197

552857

3120

09:29

Also, we're not economic rational agents.

198

557127

3369

09:33

We all are biased.

199

561031

1292

09:34

We're all racist and bigoted
in ways that we wish we weren't,

200

562960

3377

09:38

in ways that we don't even know.

201

566361

2019

09:41

We know this, though, in aggregate,

202

569352

3081

09:44

because sociologists
have consistently demonstrated this

203

572457

3220

09:47

with these experiments they build,

204

575701

1665

09:49

where they send a bunch
of applications to jobs out,

205

577390

2568

09:51

equally qualified but some
have white-sounding names

206

579982

2501

09:54

and some have black-sounding names,

207

582507

1706

09:56

and it's always disappointing,
the results -- always.

208

584237

2694

09:59

So we are the ones that are biased,

209

587510

1771

10:01

and we are injecting those biases
into the algorithms

210

589305

3429

10:04

by choosing what data to collect,

211

592758

1812

10:06

like I chose not to think
about ramen noodles --

212

594594

2743

10:09

I decided it was irrelevant.

213

597361

1625

10:11

But by trusting the data that's actually
picking up on past practices

214

599010

5684

10:16

and by choosing the definition of success,

215

604718

2014

10:18

how can we expect the algorithms
to emerge unscathed?

216

606756

3983

10:22

We can't. We have to check them.

217

610763

2356

10:26

We have to check them for fairness.

218

614165

1709

10:27

The good news is,
we can check them for fairness.

219

615898

2711

10:30

Algorithms can be interrogated,

220

618633

3352

10:34

and they will tell us
the truth every time.

221

622009

2034

10:36

And we can fix them.
We can make them better.

222

624067

2493

10:38

I call this an algorithmic audit,

223

626584

2375

10:40

and I'll walk you through it.

224

628983

1679

10:42

First, data integrity check.

225

630686

2196

10:46

For the recidivism risk
algorithm I talked about,

226

634132

2657

10:49

a data integrity check would mean
we'd have to come to terms with the fact

227

637582

3573

10:53

that in the US, whites and blacks
smoke pot at the same rate

228

641179

3526

10:56

but blacks are far more likely
to be arrested --

229

644729

2485

10:59

four or five times more likely,
depending on the area.

230

647238

3184

11:03

What is that bias looking like
in other crime categories,

231

651317

2826

11:06

and how do we account for it?

232

654167

1451

11:08

Second, we should think about
the definition of success,

233

656162

3039

11:11

audit that.

234

659225

1381

11:12

Remember -- with the hiring
algorithm? We talked about it.

235

660630

2752

11:15

Someone who stays for four years
and is promoted once?

236

663406

3165

11:18

Well, that is a successful employee,

237

666595

1769

11:20

but it's also an employee
that is supported by their culture.

238

668388

3079

11:24

That said, also it can be quite biased.

239

672089

1926

11:26

We need to separate those two things.

240

674039

2065

11:28

We should look to
the blind orchestra audition

241

676128

2426

11:30

as an example.

242

678578

1196

11:31

That's where the people auditioning
are behind a sheet.

243

679798

2756

11:34

What I want to think about there

244

682946

1931

11:36

is the people who are listening
have decided what's important

245

684901

3417

11:40

and they've decided what's not important,

246

688342

2029

11:42

and they're not getting
distracted by that.

247

690395

2059

11:44

When the blind orchestra
auditions started,

248

692961

2749

11:47

the number of women in orchestras
went up by a factor of five.

249

695734

3444

11:52

Next, we have to consider accuracy.

250

700253

2015

11:55

This is where the value-added model
for teachers would fail immediately.

251

703233

3734

11:59

No algorithm is perfect, of course,

252

707578

2162

12:02

so we have to consider
the errors of every algorithm.

253

710620

3605

12:06

How often are there errors,
and for whom does this model fail?

254

714836

4359

12:11

What is the cost of that failure?

255

719850

1718

12:14

And finally, we have to consider

256

722434

2207

12:17

the long-term effects of algorithms,

257

725973

2186

12:20

the feedback loops that are engendering.

258

728866

2207

12:23

That sounds abstract,

259

731586

1236

12:24

but imagine if Facebook engineers
had considered that

260

732846

2664

12:28

before they decided to show us
only things that our friends had posted.

261

736270

4855

12:33

I have two more messages,
one for the data scientists out there.

262

741761

3234

12:37

Data scientists: we should
not be the arbiters of truth.

263

745450

3409

12:41

We should be translators
of ethical discussions that happen

264

749520

3783

12:45

in larger society.

265

753327

1294

12:47

(Applause)

266

755579

2133

12:49

And the rest of you,

267

757736

1556

12:52

the non-data scientists:

268

760011

1396

12:53

this is not a math test.

269

761431

1498

12:55

This is a political fight.

270

763632

1348

12:58

We need to demand accountability
for our algorithmic overlords.

271

766587

3907

13:04

(Applause)

272

772118

1499

13:05

The era of blind faith
in big data must end.

273

773641

4225

13:09

Thank you very much.

274

777890

1167

13:11

(Applause)

275

779081

5303

ABOUT THE SPEAKER

Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias.

Why you should listen

In 2008, as a hedge-fund quant, mathematician Cathy O’Neil saw firsthand how really really bad math could lead to financial disaster. Disillusioned, O’Neil became a data scientist and eventually joined Occupy Wall Street’s Alternative Banking Group.

With her popular blog mathbabe.org, O’Neil emerged as an investigative journalist. Her acclaimed book Weapons of Math Destruction details how opaque, black-box algorithms rely on biased historical data to do everything from sentence defendants to hire workers. In 2017, O’Neil founded consulting firm ORCAA to audit algorithms for racial, gender and economic inequality.

More profile about the speaker
Cathy O'Neil | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

Cathy O'Neil: The era of blind faith in big data must end | TED Talk | TED.com