ABOUT THE SPEAKER

Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias.

Why you should listen

In 2008, as a hedge-fund quant, mathematician Cathy O’Neil saw firsthand how really really bad math could lead to financial disaster. Disillusioned, O’Neil became a data scientist and eventually joined Occupy Wall Street’s Alternative Banking Group.

With her popular blog mathbabe.org, O’Neil emerged as an investigative journalist. Her acclaimed book Weapons of Math Destruction details how opaque, black-box algorithms rely on biased historical data to do everything from sentence defendants to hire workers. In 2017, O’Neil founded consulting firm ORCAA to audit algorithms for racial, gender and economic inequality.

More profile about the speaker
Cathy O'Neil | Speaker | TED.com

TED2017

Cathy O'Neil: The era of blind faith in big data must end

凯西·奥尼尔: 盲目信仰大数据的时代必须结束

Filmed: 2017-04-24

Readability: 4.4

1,391,460 views

算法决定谁会得到贷款，谁会得到工作面试，谁会得到保险等等—— 但它们不会自动使事情变得公平。身为数学家兼数据科学家的凯西·奥尼尔为算法创造了一个术语，它们是秘密的、重要的和有害的：“杀伤性数学武器”。通过这个演讲了解更多关于这些公式背后不为人知的运作方式吧。

Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias. Full bio

Double-click the English transcript below to play the video.

00:12

Algorithms算法 are everywhere到处.

0

975

1596

算法无处不在。

00:16

They sort分类 and separate分离
the winners获奖者 from the losers失败者.

1

4111

3125

他们把成功者和失败者区分开来。

00:20

The winners获奖者 get the job工作

2

8019

2264

成功者得到工作

或是一个很好的信用卡优惠计划。

00:22

or a good credit信用 card卡 offer提供.

3

10307

1743

失败者甚至连面试机会都没有，

00:24

The losers失败者 don't even get an interview访问

4

12074

2651

00:27

or they pay工资 more for insurance保险.

5

15590

1777

或者要为保险付更多的钱。

00:30

We're being存在 scored进球 with secret秘密 formulas公式
that we don't understand理解

6

18197

3549

我们被不理解的秘密公式打分，

00:34

that often经常 don't have systems系统 of appeal上诉.

7

22675

3217

却并没有上诉的渠道。

00:39

That begs乞求 the question题:

8

27240

1296

这引出了一个问题：

如果算法是错误的怎么办？

00:40

What if the algorithms算法 are wrong错误?

9

28560

2913

00:45

To build建立 an algorithm算法 you need two things:

10

33100

2040

构建一个算法需要两个要素：

需要数据，如过去发生的事情，

00:47

you need data数据, what happened发生 in the past过去,

11

35164

1981

和成功的定义，

00:49

and a definition定义 of success成功,

12

37169

1561

你正在寻找的，通常希望得到的东西。

00:50

the thing you're looking for
and often经常 hoping希望 for.

13

38754

2457

你可以通过观察，理解来训练算法。

00:53

You train培养 an algorithm算法
by looking, figuring盘算 out.

14

41235

5037

这种算法能找出与成功相关的因素。

00:58

The algorithm算法 figures人物 out
what is associated相关 with success成功.

15

46296

3419

什么情况意味着成功？

01:01

What situation情况 leads引线 to success成功?

16

49739

2463

01:04

Actually其实, everyone大家 uses使用 algorithms算法.

17

52881

1762

其实，每个人都使用算法。

01:06

They just don't formalize形式化 them
in written书面 code码.

18

54667

2718

他们只是没有把它们写成书面代码。

举个例子。

01:09

Let me give you an example例.

19

57409

1348

我每天都用一种算法来
为我的家人做饭。

01:10

I use an algorithm算法 every一切 day
to make a meal膳食 for my family家庭.

20

58781

3316

01:14

The data数据 I use

21

62121

1476

我使用的数据

01:16

is the ingredients配料 in my kitchen厨房,

22

64394

1659

就是我厨房里的原料，

我拥有的时间，

01:18

the time I have,

23

66077

1527

我的热情，

01:19

the ambition志向 I have,

24

67628

1233

然后我整理了这些数据。

01:20

and I curate策划 that data数据.

25

68885

1709

我不把那种小包拉面算作食物。

01:22

I don't count计数 those little packages包
of ramen拉面 noodles面条 as food餐饮.

26

70618

4251

（笑声）

01:26

(Laughter笑声)

27

74893

1869

我对成功的定义是：

01:28

My definition定义 of success成功 is:

28

76786

1845

如果我的孩子们肯吃蔬菜，
这顿饭就是成功的。

01:30

a meal膳食 is successful成功
if my kids孩子 eat吃 vegetables蔬菜.

29

78655

2659

01:34

It's very different不同
from if my youngest最年轻的 son儿子 were in charge收费.

30

82181

2854

这和我最小的儿子
负责做饭时的情况有所不同。

他说，如果他能吃很多
Nutella巧克力榛子酱就是成功。

01:37

He'd他会 say success成功 is if
he gets得到 to eat吃 lots of Nutella花生酱.

31

85059

2788

01:41

But I get to choose选择 success成功.

32

89179

2226

但我可以选择成功。

我负责。我的意见就很重要。

01:43

I am in charge收费. My opinion意见 matters事项.

33

91429

2707

这就是算法的第一个规则。

01:46

That's the first rule规则 of algorithms算法.

34

94160

2675

算法是嵌入在代码中的观点。

01:48

Algorithms算法 are opinions意见 embedded嵌入式 in code码.

35

96859

3180

01:53

It's really different不同 from what you think
most最 people think of algorithms算法.

36

101562

3663

这和你认为大多数人对
算法的看法是不同的。

他们认为算法是客观、真实和科学的。

01:57

They think algorithms算法 are objective目的
and true真正 and scientific科学.

37

105249

4504

02:02

That's a marketing营销 trick招.

38

110387

1699

那是一种营销技巧。

02:05

It's also也 a marketing营销 trick招

39

113269

2125

这也是一种用算法来

恐吓你的营销手段，

02:07

to intimidate威吓 you with algorithms算法,

40

115418

3154

为了让你信任和恐惧算法

02:10

to make you trust相信 and fear恐惧 algorithms算法

41

118596

3661

因为你信任并害怕数学。

02:14

because you trust相信 and fear恐惧 mathematics数学.

42

122281

2018

02:17

A lot can go wrong错误 when we put
blind盲 faith信仰 in big大 data数据.

43

125567

4830

当我们盲目信任大数据时，
很多人都可能犯错。

02:23

This is Kiri基里 Soares苏亚雷斯.
She's a high高 school学校 principal主要 in Brooklyn布鲁克林.

44

131684

3373

这是凯丽·索尔斯。
她是布鲁克林的一名高中校长。

2011年，她告诉我，
她学校的老师们正在被一个复杂

02:27

In 2011, she told me
her teachers教师 were being存在 scored进球

45

135081

2586

02:29

with a complex复杂, secret秘密 algorithm算法

46

137691

2727

并且隐秘的算法进行打分，

这个算法被称为“增值模型"。

02:32

called叫 the "value-added增值 model模型."

47

140442

1489

02:34

I told her, "Well, figure数字 out
what the formula式 is, show显示 it to me.

48

142505

3092

我告诉她，“先弄清楚这个
公式是什么，然后给我看看。

我来给你解释一下。”

02:37

I'm going to explain说明 it to you."

49

145621

1541

她说，“我寻求过这个公式，

02:39

She said, "Well, I tried试着
to get the formula式,

50

147186

2141

但是教育部的负责人告诉我这是数学，

02:41

but my Department部 of Education教育 contact联系
told me it was math数学

51

149351

2772

给我我也看不懂。”

02:44

and I wouldn't不会 understand理解 it."

52

152147

1546

02:47

It gets得到 worse更差.

53

155266

1338

更糟的还在后面。

纽约邮报提出了“信息自由法”的要求，

02:48

The New新 York纽约 Post岗位 filed提交
a Freedom自由 of Information信息 Act法案 request请求,

54

156628

3530

来得到所有老师的名字与他们的分数，

02:52

got all the teachers'教师' names名
and all their其 scores分数

55

160182

2959

并且他们以羞辱教师的方式
发表了这些数据。

02:55

and they published发表 them
as an act法案 of teacher-shaming教师羞辱.

56

163165

2782

02:59

When I tried试着 to get the formulas公式,
the source资源 code码, through通过 the same相同 means手段,

57

167084

3860

当我试图用同样的方法来获取公式，
源代码的时候，

03:02

I was told I couldn't不能.

58

170968

2149

我被告知我没有权力这么做。

我被拒绝了。

03:05

I was denied否认.

59

173141

1236

后来我发现，

03:06

I later后来 found发现 out

60

174401

1174

纽约市压根儿没有人能接触到这个公式。

03:07

that nobody没有人 in New新 York纽约 City市
had access访问 to that formula式.

61

175599

2866

没有人能看懂。

03:10

No one understood了解 it.

62

178489

1305

03:13

Then someone有人 really smart聪明
got involved参与, Gary加里 Rubenstein鲁宾斯坦.

63

181929

3224

然后，一个非常聪明的人参与了，
加里·鲁宾斯坦。

03:17

He found发现 665 teachers教师
from that New新 York纽约 Post岗位 data数据

64

185177

3621

他从纽约邮报的数据中
找到了665名教师，

03:20

that actually其实 had two scores分数.

65

188822

1866

实际上他们有两个分数。

如果他们同时教七年级与八年级的数学，

03:22

That could happen发生 if they were teaching教学

66

190712

1881

就会得到两个评分。

03:24

seventh第七 grade年级 math数学 and eighth第八 grade年级 math数学.

67

192617

2439

他决定把这些数据绘成图表。

03:27

He decided决定 to plot情节 them.

68

195080

1538

每个点代表一个教师。

03:28

Each每 dot点 represents代表 a teacher老师.

69

196642

1993

03:31

(Laughter笑声)

70

199104

2379

（笑声）

03:33

What is that?

71

201507

1521

那是什么？

03:35

(Laughter笑声)

72

203052

1277

（笑声）

03:36

That should never have been used
for individual个人 assessment评定.

73

204353

3446

它永远不应该被用于个人评估。

它几乎是一个随机数生成器。

03:39

It's almost几乎 a random随机 number数 generator发电机.

74

207823

1926

03:41

(Applause掌声)

75

209773

2946

（掌声）

但它确实被使用了。

03:44

But it was.

76

212743

1162

这是莎拉·维索斯基。

03:45

This is Sarah莎拉 Wysocki威索基.

77

213929

1176

她连同另外205名教师被解雇了，

03:47

She got fired解雇, along沿
with 205 other teachers教师,

78

215129

2175

都是来自华盛顿特区的学区，

03:49

from the Washington华盛顿, DCDC school学校 district区,

79

217328

2662

03:52

even though虽然 she had great
recommendations建议 from her principal主要

80

220014

2909

尽管她的校长还有学生的

父母都非常推荐她。

03:54

and the parents父母 of her kids孩子.

81

222947

1428

03:57

I know what a lot
of you guys are thinking思维,

82

225390

2032

我知道你们很多人在想什么，

尤其是这里的数据科学家，
人工智能专家。

03:59

especially特别 the data数据 scientists科学家们,
the AIAI experts专家 here.

83

227446

2487

你在想，“我可永远不会做出
这样前后矛盾的算法。”

04:01

You're thinking思维, "Well, I would never make
an algorithm算法 that inconsistent不符."

84

229957

4226

04:06

But algorithms算法 can go wrong错误,

85

234853

1683

但是算法可能会出错，

即使有良好的意图，
也会产生毁灭性的影响。

04:08

even have deeply深 destructive有害 effects效果
with good intentions意图.

86

236560

4598

04:14

And whereas而 an airplane飞机
that's designed设计 badly惨

87

242531

2379

每个人都能看到一架设计的

很糟糕的飞机会坠毁在地，

04:16

crashes崩溃 to the earth地球 and everyone大家 sees看到 it,

88

244934

2001

而一个设计糟糕的算法

04:18

an algorithm算法 designed设计 badly惨

89

246959

1850

04:22

can go on for a long time,
silently默默 wreaking发泄 havoc浩劫.

90

250245

3865

可以持续很长一段时间，
并无声地造成破坏。

04:27

This is Roger罗杰 Ailes艾尔斯.

91

255748

1570

这是罗杰·艾尔斯。

04:29

(Laughter笑声)

92

257342

2000

（笑声）

04:32

He founded成立 Fox狐狸 News新闻 in 1996.

93

260524

2388

他在1996年创办了福克斯新闻。

04:35

More than 20 women妇女 complained抱怨
about sexual有性 harassment骚扰.

94

263436

2581

公司有超过20多名女性曾抱怨过性骚扰。

她们说她们不被允许在
福克斯新闻有所成就。

04:38

They said they weren't不 allowed允许
to succeed成功 at Fox狐狸 News新闻.

95

266041

3235

他去年被赶下台，但我们最近看到

04:41

He was ousted下台 last year年,
but we've我们已经 seen看到 recently最近

96

269300

2520

问题依然存在。

04:43

that the problems问题 have persisted坚持.

97

271844

2670

这引出了一个问题：

04:47

That begs乞求 the question题:

98

275654

1400

福克斯新闻应该做些什么改变？

04:49

What should Fox狐狸 News新闻 do
to turn转 over another另一个 leaf叶?

99

277078

2884

04:53

Well, what if they replaced更换
their其 hiring招聘 process处理

100

281245

3041

如果他们用机器学习算法

取代传统的招聘流程呢？

04:56

with a machine-learning机器学习 algorithm算法?

101

284310

1654

听起来不错，对吧？

04:57

That sounds声音 good, right?

102

285988

1595

想想看。

04:59

Think about it.

103

287607

1300

数据，这些数据到底是什么？

05:00

The data数据, what would the data数据 be?

104

288931

2105

福克斯新闻在过去21年的申请函
是一个合理的选择。

05:03

A reasonable合理 choice选择 would be the last
21 years年份 of applications应用 to Fox狐狸 News新闻.

105

291060

4947

很合理。

05:08

Reasonable合理.

106

296031

1502

那么成功的定义呢？

05:09

What about the definition定义 of success成功?

107

297557

1938

合理的选择将是，

05:11

Reasonable合理 choice选择 would be,

108

299921

1324

谁在福克斯新闻取得了成功？

05:13

well, who is successful成功 at Fox狐狸 News新闻?

109

301269

1778

我猜的是，比如在那里呆了四年，

05:15

I guess猜测 someone有人 who, say,
stayed住 there for four四 years年份

110

303071

3580

至少得到过一次晋升的人。

05:18

and was promoted提拔 at least最小 once一旦.

111

306675

1654

05:20

Sounds声音 reasonable合理.

112

308816

1561

听起来很合理。

05:22

And then the algorithm算法 would be trained熟练.

113

310401

2354

然后这个算法将会被训练。

它会被训练去向人们
学习是什么造就了成功，

05:24

It would be trained熟练 to look for people
to learn学习 what led to success成功,

114

312779

3877

05:29

what kind类 of applications应用
historically历史 led to success成功

115

317219

4318

什么样的申请函在过去拥有

这种成功的定义。

05:33

by that definition定义.

116

321561

1294

05:36

Now think about what would happen发生

117

324200

1775

现在想想如果我们把它

应用到目前的申请者中会发生什么。

05:37

if we applied应用的 that
to a current当前 pool池 of applicants申请人.

118

325999

2555

它会过滤掉女性，

05:41

It would filter过滤 out women妇女

119

329119

1629

05:43

because they do not look like people
who were successful成功 in the past过去.

120

331663

3930

因为她们看起来不像
在过去取得成功的人。

05:51

Algorithms算法 don't make things fair公平

121

339752

2537

算法不会让事情变得公平，

如果你只是轻率地，
盲目地应用算法。

05:54

if you just blithely轻率地,
blindly盲目地 apply应用 algorithms算法.

122

342313

2694

它们不会让事情变得公平。

05:57

They don't make things fair公平.

123

345031

1482

它们只是重复我们过去的做法，

05:58

They repeat重复 our past过去 practices做法,

124

346537

2128

06:00

our patterns模式.

125

348689

1183

我们的规律。

它们使现状自动化。

06:01

They automate自动化 the status状态 quo现状.

126

349896

1939

06:04

That would be great
if we had a perfect完善 world世界,

127

352718

2389

如果我们有一个
完美的世界那就太好了，

06:07

but we don't.

128

355905

1312

但是我们没有。

我还要补充一点，
大多数公司都没有令人尴尬的诉讼，

06:09

And I'll add加 that most最 companies公司
don't have embarrassing尴尬 lawsuits诉讼,

129

357241

4102

06:14

but the data数据 scientists科学家们 in those companies公司

130

362446

2588

但是这些公司的数据科学家

06:17

are told to follow跟随 the data数据,

131

365058

2189

被告知要跟随数据，

关注它的准确性。

06:19

to focus焦点 on accuracy准确性.

132

367271

2143

06:22

Think about what that means手段.

133

370273

1381

想想这意味着什么。

因为我们都有偏见，
这意味着他们可以编纂性别歧视

06:23

Because we all have bias偏压,
it means手段 they could be codifying编纂 sexism性别歧视

134

371678

4027

或者任何其他的偏见。

06:27

or any other kind类 of bigotry偏执.

135

375729

1836

06:31

Thought experiment实验,

136

379488

1421

思维实验，

因为我喜欢它们：

06:32

because I like them:

137

380933

1509

06:35

an entirely完全 segregated隔离 society社会 --

138

383574

2975

一个完全隔离的社会——

06:40

racially种族 segregated隔离, all towns城市,
all neighborhoods社区

139

388247

3328

种族隔离存在于所有的城镇，
所有的社区，

我们把警察只送到少数族裔的社区

06:43

and where we send发送 the police警察
only to the minority少数民族 neighborhoods社区

140

391599

3037

去寻找犯罪。

06:46

to look for crime犯罪.

141

394660

1193

06:48

The arrest逮捕 data数据 would be very biased偏.

142

396451

2219

逮捕数据将会是十分有偏见的。

06:51

What if, on top最佳 of that,
we found发现 the data数据 scientists科学家们

143

399851

2575

除此之外，我们还会寻找数据科学家

并付钱给他们来预测
下一起犯罪会发生在哪里？

06:54

and paid支付 the data数据 scientists科学家们 to predict预测
where the next下一个 crime犯罪 would occur发生?

144

402450

4161

06:59

Minority少数民族 neighborhood邻里.

145

407275

1487

少数族裔的社区。

07:01

Or to predict预测 who the next下一个
criminal刑事 would be?

146

409285

3125

或者预测下一个罪犯会是谁？

07:04

A minority少数民族.

147

412888

1395

少数族裔。

07:07

The data数据 scientists科学家们 would brag吹牛
about how great and how accurate准确

148

415949

3541

这些数据科学家们
会吹嘘他们的模型有多好，

多精确，

07:11

their其 model模型 would be,

149

419514

1297

07:12

and they'd他们会 be right.

150

420835

1299

当然他们是对的。

07:15

Now, reality现实 isn't that drastic激烈,
but we do have severe严重 segregations偏析

151

423951

4615

不过现实并没有那么极端，
但我们确实在许多城市里

有严重的种族隔离，

07:20

in many许多 cities城市 and towns城市,

152

428590

1287

并且我们有大量的证据表明

07:21

and we have plenty丰富 of evidence证据

153

429901

1893

07:23

of biased偏 policing治安
and justice正义 system系统 data数据.

154

431818

2688

警察和司法系统的数据存有偏见。

07:27

And we actually其实 do predict预测 hotspots热点,

155

435632

2815

而且我们确实预测过热点，

07:30

places地方 where crimes犯罪 will occur发生.

156

438471

1530

那些犯罪会发生的地方。

07:32

And we do predict预测, in fact事实,
the individual个人 criminality犯罪,

157

440401

3866

我们确实会预测个人犯罪，

个人的犯罪行为。

07:36

the criminality犯罪 of individuals个人.

158

444291

1770

07:38

The news新闻 organization组织 ProPublicaProPublica
recently最近 looked看着 into

159

446972

3963

新闻机构“人民 (ProPublica)”最近调查了，

其中一个称为

07:42

one of those "recidivism累犯 risk风险" algorithms算法,

160

450959

2024

“累犯风险”的算法。

07:45

as they're called叫,

161

453007

1163

并在佛罗里达州的
宣判期间被法官采用。

07:46

being存在 used in Florida佛罗里达
during中 sentencing宣判 by judges法官.

162

454194

3194

07:50

Bernard伯纳德, on the left, the black黑色 man,
was scored进球 a 10 out of 10.

163

458411

3585

伯纳德，左边的那个黑人，
10分中得了满分。

07:55

Dylan迪伦, on the right, 3 out of 10.

164

463179

2007

在右边的迪伦，
10分中得了3分。

10分代表高风险。
3分代表低风险。

07:57

10 out of 10, high高 risk风险.
3 out of 10, low低 risk风险.

165

465210

2501

08:00

They were both都 brought带 in
for drug药物 possession所有权.

166

468598

2385

他们都因为持有毒品
而被带进了监狱。

他们都有犯罪记录，

08:03

They both都 had records记录,

167

471007

1154

08:04

but Dylan迪伦 had a felony重罪

168

472185

2806

但是迪伦有一个重罪

但伯纳德没有。

08:07

but Bernard伯纳德 didn't.

169

475015

1176

08:09

This matters事项, because
the higher更高 score得分了 you are,

170

477818

3066

这很重要，因为你的分数越高，

你被判长期服刑的可能性就越大。

08:12

the more likely容易 you're being存在 given特定
a longer长 sentence句子.

171

480908

3473

08:18

What's going on?

172

486294

1294

到底发生了什么？

08:20

Data数据 laundering洗钱.

173

488526

1332

数据洗钱。

08:22

It's a process处理 by which哪一个
technologists技术专家 hide隐藏 ugly丑陋 truths真理

174

490930

4427

这是一个技术人员
把丑陋真相隐藏在

算法黑盒子中的过程，

08:27

inside内 black黑色 box框 algorithms算法

175

495381

1821

并称之为客观；

08:29

and call them objective目的;

176

497226

1290

08:31

call them meritocratic精英.

177

499320

1568

称之为精英模式。

08:35

When they're secret秘密,
important重要 and destructive有害,

178

503118

2385

当它们是秘密的，
重要的并具有破坏性的，

我为这些算法创造了一个术语：

08:37

I've coined创造 a term术语 for these algorithms算法:

179

505527

2487

08:40

"weapons武器 of math数学 destruction毁坏."

180

508038

1999

“杀伤性数学武器”。

08:42

(Laughter笑声)

181

510061

1564

（笑声）

08:43

(Applause掌声)

182

511649

3054

（鼓掌）

它们无处不在，也不是一个错误。

08:46

They're everywhere到处,
and it's not a mistake错误.

183

514727

2354

08:49

These are private私人的 companies公司
building建造 private私人的 algorithms算法

184

517695

3723

这些是私有公司为了私人目的

建立的私有算法。

08:53

for private私人的 ends结束.

185

521442

1392

08:55

Even the ones那些 I talked谈 about
for teachers教师 and the public上市 police警察,

186

523214

3214

甚至是我谈到的老师
与公共警察使用的（算法），

08:58

those were built内置 by private私人的 companies公司

187

526452

1869

也都是由私人公司所打造的，

然后卖给政府机构。

09:00

and sold出售 to the government政府 institutions机构.

188

528345

2231

09:02

They call it their其 "secret秘密 sauce酱" --

189

530600

1873

他们称之为“秘密配方（来源）”——

这就是他们不能告诉我们的原因。

09:04

that's why they can't tell us about it.

190

532497

2128

09:06

It's also也 private私人的 power功率.

191

534649

2220

这也是私人权力。

09:09

They are profiting获利 for wielding挥舞
the authority权威 of the inscrutable高深莫测.

192

537924

4695

他们利用神秘莫测的权威来获利。

09:17

Now you might威力 think,
since以来 all this stuff东东 is private私人的

193

545114

2934

你可能会想，既然所有这些都是私有的

而且会有竞争，

09:20

and there's competition竞争,

194

548072

1158

也许自由市场会解决这个问题。

09:21

maybe the free自由 market市场
will solve解决 this problem问题.

195

549254

2306

然而并不会。

09:23

It won't惯于.

196

551584

1249

在不公平的情况下，
有很多钱可以赚。

09:24

There's a lot of money钱
to be made制作 in unfairness不平.

197

552857

3120

09:29

Also也, we're not economic经济 rational合理的 agents代理.

198

557127

3369

而且，我们不是经济理性的代理人。

09:33

We all are biased偏.

199

561031

1292

我们都是有偏见的。

09:34

We're all racist种族主义者 and bigoted拘泥
in ways方法 that we wish希望 we weren't不,

200

562960

3377

我们都是固执的种族主义者，
虽然我们希望我们不是，

虽然我们甚至没有意识到。

09:38

in ways方法 that we don't even know.

201

566361

2019

09:41

We know this, though虽然, in aggregate骨料,

202

569352

3081

总的来说，我们知道这一点，

因为社会学家会一直通过这些实验

09:44

because sociologists社会学家
have consistently始终如一 demonstrated证明 this

203

572457

3220

来证明这一点，

09:47

with these experiments实验 they build建立,

204

575701

1665

他们发送了大量的工作申请，

09:49

where they send发送 a bunch束
of applications应用 to jobs工作 out,

205

577390

2568

都是有同样资格的候选人，
有些用白人人名，

09:51

equally一样 qualified合格 but some
have white-sounding白冠冕堂皇 names名

206

579982

2501

有些用黑人人名，

09:54

and some have black-sounding黑冠冕堂皇 names名,

207

582507

1706

然而结果总是令人失望的。

09:56

and it's always disappointing令人失望,
the results结果 -- always.

208

584237

2694

09:59

So we are the ones那些 that are biased偏,

209

587510

1771

所以我们是有偏见的，

我们还通过选择收集到的数据

10:01

and we are injecting注射 those biases偏见
into the algorithms算法

210

589305

3429

10:04

by choosing选择 what data数据 to collect搜集,

211

592758

1812

来把偏见注入到算法中，

10:06

like I chose选择 not to think
about ramen拉面 noodles面条 --

212

594594

2743

就像我不选择去想拉面一样——

我自认为这无关紧要。

10:09

I decided决定 it was irrelevant不相干.

213

597361

1625

但是，通过信任那些
在过去的实践中获得的数据

10:11

But by trusting信任的 the data数据 that's actually其实
picking选择 up on past过去 practices做法

214

599010

5684

10:16

and by choosing选择 the definition定义 of success成功,

215

604718

2014

以及通过选择成功的定义，

10:18

how can we expect期望 the algorithms算法
to emerge出现 unscathed毫发无损?

216

606756

3983

我们怎么能指望算法
会是毫无瑕疵的呢？

我们不能。我们必须检查。

10:22

We can't. We have to check查 them.

217

610763

2356

10:26

We have to check查 them for fairness公平.

218

614165

1709

我们必须检查它们是否公平。

好消息是，我们可以做到这一点。

10:27

The good news新闻 is,
we can check查 them for fairness公平.

219

615898

2711

算法是可以被审问的，

10:30

Algorithms算法 can be interrogated审问,

220

618633

3352

而且每次都能告诉我们真相。

10:34

and they will tell us
the truth真相 every一切 time.

221

622009

2034

10:36

And we can fix固定 them.
We can make them better.

222

624067

2493

然后我们可以修复它们。
我们可以让他们变得更好。

我把它叫做算法审计，

10:38

I call this an algorithmic算法 audit审计,

223

626584

2375

接下来我会为你们解释。

10:40

and I'll walk步行 you through通过 it.

224

628983

1679

10:42

First, data数据 integrity廉正 check查.

225

630686

2196

首先，数据的完整性检查。

10:46

For the recidivism累犯 risk风险
algorithm算法 I talked谈 about,

226

634132

2657

对于刚才提到过的累犯风险算法，

10:49

a data数据 integrity廉正 check查 would mean
we'd星期三 have to come to terms条款 with the fact事实

227

637582

3573

数据的完整性检查将意味着
我们不得不接受这个事实，

10:53

that in the US, whites白人 and blacks黑人
smoke抽烟 pot锅 at the same相同 rate率

228

641179

3526

在美国，白人和黑人
吸毒的比例是一样的，

但是黑人更有可能被逮捕——

10:56

but blacks黑人 are far远 more likely容易
to be arrested被捕 --

229

644729

2485

10:59

four四 or five五 times时 more likely容易,
depending根据 on the area区.

230

647238

3184

取决于区域，可能性是白人的4到5倍。

11:03

What is that bias偏压 looking like
in other crime犯罪 categories类别,

231

651317

2826

这种偏见在其他犯罪类别中
是什么样子的，

11:06

and how do we account帐户 for it?

232

654167

1451

我们又该如何解释呢？

11:08

Second第二, we should think about
the definition定义 of success成功,

233

656162

3039

其次，我们应该考虑成功的定义，

审计它。

11:11

audit审计 that.

234

659225

1381

还记得我们谈论的雇佣算法吗？

11:12

Remember记得 -- with the hiring招聘
algorithm算法? We talked谈 about it.

235

660630

2752

那个呆了四年的人，
然后被提升了一次？

11:15

Someone有人 who stays入住 for four四 years年份
and is promoted提拔 once一旦?

236

663406

3165

11:18

Well, that is a successful成功 employee雇员,

237

666595

1769

这的确是一个成功的员工，

但这也是一名受到公司文化支持的员工。

11:20

but it's also也 an employee雇员
that is supported支持的 by their其 culture文化.

238

668388

3079

11:24

That said, also也 it can be quite相当 biased偏.

239

672089

1926

也就是说，
这可能会有很大的偏差。

我们需要把这两件事分开。

11:26

We need to separate分离 those two things.

240

674039

2065

我们应该去看一下乐团盲选试奏，

11:28

We should look to
the blind盲 orchestra乐队 audition面试

241

676128

2426

举个例子。

11:30

as an example例.

242

678578

1196

这就是人们在幕后选拔乐手的地方。

11:31

That's where the people auditioning试镜
are behind背后 a sheet片.

243

679798

2756

11:34

What I want to think about there

244

682946

1931

我想要考虑的是

倾听的人已经
决定了什么是重要的，

11:36

is the people who are listening听
have decided决定 what's important重要

245

684901

3417

11:40

and they've他们已经 decided决定 what's not important重要,

246

688342

2029

同时他们已经决定了
什么是不重要的，

他们也不会因此而分心。

11:42

and they're not getting得到
distracted分心 by that.

247

690395

2059

11:44

When the blind盲 orchestra乐队
auditions试镜 started开始,

248

692961

2749

当乐团盲选开始时，

在管弦乐队中，
女性的数量上升了5倍。

11:47

the number数 of women妇女 in orchestras乐团
went去 up by a factor因子 of five五.

249

695734

3444

11:52

Next下一个, we have to consider考虑 accuracy准确性.

250

700253

2015

其次，我们必须考虑准确性。

11:55

This is where the value-added增值 model模型
for teachers教师 would fail失败 immediately立即.

251

703233

3734

这就是针对教师的增值模型
立刻失效的地方。

11:59

No algorithm算法 is perfect完善, of course课程,

252

707578

2162

当然，没有一个算法是完美的，

12:02

so we have to consider考虑
the errors错误 of every一切 algorithm算法.

253

710620

3605

所以我们要考虑每一个算法的误差。

12:06

How often经常 are there errors错误,
and for whom谁 does this model模型 fail失败?

254

714836

4359

出现错误的频率有多高，
让这个模型失败的对象是谁？

12:11

What is the cost成本 of that failure失败?

255

719850

1718

失败的代价是什么？

12:14

And finally最后, we have to consider考虑

256

722434

2207

最后，我们必须考虑

12:17

the long-term长期 effects效果 of algorithms算法,

257

725973

2186

这个算法的长期效果，

12:20

the feedback反馈 loops循环 that are engendering从社会性别角度.

258

728866

2207

与正在产生的反馈循环。

12:23

That sounds声音 abstract抽象,

259

731586

1236

这听起来很抽象，

但是想象一下
如果脸书的工程师们之前考虑过，

12:24

but imagine想像 if FacebookFacebook的 engineers工程师
had considered考虑 that

260

732846

2664

12:28

before they decided决定 to show显示 us
only things that our friends朋友 had posted发布.

261

736270

4855

并决定只向我们展示
我们朋友所发布的东西。

12:33

I have two more messages消息,
one for the data数据 scientists科学家们 out there.

262

741761

3234

我还有两条建议，
一条是给数据科学家的。

12:37

Data数据 scientists科学家们: we should
not be the arbiters仲裁者 of truth真相.

263

745450

3409

数据科学家们：我们不应该
成为真相的仲裁者。

12:41

We should be translators译者
of ethical合乎道德的 discussions讨论 that happen发生

264

749520

3783

我们应该成为大社会中
所发生的道德讨论的

翻译者。

12:45

in larger大 society社会.

265

753327

1294

12:47

(Applause掌声)

266

755579

2133

（掌声）

12:49

And the rest休息 of you,

267

757736

1556

然后剩下的人，

12:52

the non-data非数据 scientists科学家们:

268

760011

1396

非数据科学家们：

这不是一个数学测试。

12:53

this is not a math数学 test测试.

269

761431

1498

12:55

This is a political政治 fight斗争.

270

763632

1348

这是一场政治斗争。

12:58

We need to demand需求 accountability问责
for our algorithmic算法 overlords霸主.

271

766587

3907

我们应该要求我们的
算法霸主承担问责。

13:04

(Applause掌声)

272

772118

1499

（掌声）

13:05

The era时代 of blind盲 faith信仰
in big大 data数据 must必须 end结束.

273

773641

4225

盲目信仰大数据的时代必须结束。

非常感谢。

13:09

Thank you very much.

274

777890

1167

（掌声）

13:11

(Applause掌声)

275

779081

5303

Translated by Lin Zhang

ABOUT THE SPEAKER

Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias.

Why you should listen

In 2008, as a hedge-fund quant, mathematician Cathy O’Neil saw firsthand how really really bad math could lead to financial disaster. Disillusioned, O’Neil became a data scientist and eventually joined Occupy Wall Street’s Alternative Banking Group.

With her popular blog mathbabe.org, O’Neil emerged as an investigative journalist. Her acclaimed book Weapons of Math Destruction details how opaque, black-box algorithms rely on biased historical data to do everything from sentence defendants to hire workers. In 2017, O’Neil founded consulting firm ORCAA to audit algorithms for racial, gender and economic inequality.

More profile about the speaker
Cathy O'Neil | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

凯西·奥尼尔: 盲目信仰大数据的时代必须结束 | TED Talk | TED.com