ABOUT THE SPEAKER

Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias.

Why you should listen

In 2008, as a hedge-fund quant, mathematician Cathy O’Neil saw firsthand how really really bad math could lead to financial disaster. Disillusioned, O’Neil became a data scientist and eventually joined Occupy Wall Street’s Alternative Banking Group.

With her popular blog mathbabe.org, O’Neil emerged as an investigative journalist. Her acclaimed book Weapons of Math Destruction details how opaque, black-box algorithms rely on biased historical data to do everything from sentence defendants to hire workers. In 2017, O’Neil founded consulting firm ORCAA to audit algorithms for racial, gender and economic inequality.

More profile about the speaker
Cathy O'Neil | Speaker | TED.com

TED2017

Cathy O'Neil: The era of blind faith in big data must end

凱西歐尼爾: 盲目信仰大數據的時代必須要結束

Filmed: 2017-04-24

Readability: 4.4

1,391,460 views

演算法決定誰能得到貸款，誰能得到工作面試機會，誰能得到保險，以及其他。但它們並不會自動讓一切變得公平，甚至不怎麼科學。數學家和資料科學家凱西歐尼爾為秘密、重要、又有毀滅性的演算法取了一個名字：「毀滅性的數學武器」。來了解一下這些應該要客觀的方程式背後暗藏了什麼玄機，以及為什麼我們需要建立更好的演算法。

Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias. Full bio

Double-click the English transcript below to play the video.

00:12

Algorithms算法 are everywhere到處.

0

975

1596

演算法無所不在。

00:16

They sort分類 and separate分離
the winners獲獎者 from the losers失敗者.

1

4111

3125

它們能把贏家和輸家區分開來。

00:20

The winners獲獎者 get the job工作

2

8019

2264

贏家能得到工作，

00:22

or a good credit信用 card卡 offer提供.

3

10307

1743

或是好的信用卡方案。

00:24

The losers失敗者 don't even get an interview訪問

4

12074

2651

輸家連面試的機會都沒有，

00:27

or they pay工資 more for insurance保險.

5

15590

1777

或是他們的保險費比較高。

00:30

We're being存在 scored進球 with secret秘密 formulas公式
that we don't understand理解

6

18197

3549

我們都被我們不了解的
秘密方程式在評分，

00:34

that often經常 don't have systems系統 of appeal上訴.

7

22675

3217

且那些方程式通常
都沒有申訴體制。

00:39

That begs乞求 the question題:

8

27240

1296

問題就來了：

00:40

What if the algorithms算法 are wrong錯誤?

9

28560

2913

如果演算法是錯的怎麼辦？

00:45

To build建立 an algorithm算法 you need two things:

10

33100

2040

要建立一個演算法，需要兩樣東西：

00:47

you need data數據, what happened發生 in the past過去,

11

35164

1981

需要資料，資料是過去發生的事，

00:49

and a definition定義 of success成功,

12

37169

1561

還需要對成功的定義，

00:50

the thing you're looking for
and often經常 hoping希望 for.

13

38754

2457

也就是你在找的東西、
你想要的東西。

00:53

You train培養 an algorithm算法
by looking, figuring盤算 out.

14

41235

5037

你透過尋找和計算的方式
來訓練一個演算法。

00:58

The algorithm算法 figures人物 out
what is associated相關 with success成功.

15

46296

3419

演算法會算出什麼和成功有相關性。

01:01

What situation情況 leads引線 to success成功?

16

49739

2463

什麼樣的情況會導致成功？

01:04

Actually其實, everyone大家 uses使用 algorithms算法.

17

52881

1762

其實，人人都在用演算法。

01:06

They just don't formalize形式化 them
in written書面 code碼.

18

54667

2718

他們只是沒把演算法寫為程式。

01:09

Let me give you an example例.

19

57409

1348

讓我舉個例子。

01:10

I use an algorithm算法 every一切 day
to make a meal膳食 for my family家庭.

20

58781

3316

我每天都用演算法
來為我的家庭做飯。

01:14

The data數據 I use

21

62121

1476

我用的資料

01:16

is the ingredients配料 in my kitchen廚房,

22

64394

1659

是我廚房中的原料、

01:18

the time I have,

23

66077

1527

我擁有的時間、

01:19

the ambition志向 I have,

24

67628

1233

我的野心、

01:20

and I curate策劃 that data數據.

25

68885

1709

我把這些資料拿來做策劃。

01:22

I don't count計數 those little packages包
of ramen拉麵 noodles麵條 as food餐飲.

26

70618

4251

我不把那一小包小包的
拉麵條視為是食物。

01:26

(Laughter笑聲)

27

74893

1869

（笑聲）

01:28

My definition定義 of success成功 is:

28

76786

1845

我對成功的定義是：

01:30

a meal膳食 is successful成功
if my kids孩子 eat吃 vegetables蔬菜.

29

78655

2659

如果我的孩子吃了蔬菜，
這頓飯就算成功。

01:34

It's very different不同
from if my youngest最年輕的 son兒子 were in charge收費.

30

82181

2854

但如果我的小兒子主導時
一切就不同了。

01:37

He'd他會 say success成功 is if
he gets得到 to eat吃 lots of Nutella花生.

31

85059

2788

他會說，如果能吃到很多
能多益（巧克力榛果醬）就算成功。

01:41

But I get to choose選擇 success成功.

32

89179

2226

但我能選擇什麼才算成功。

01:43

I am in charge收費. My opinion意見 matters事項.

33

91429

2707

我是主導的人，我的意見才重要。

01:46

That's the first rule規則 of algorithms算法.

34

94160

2675

那是演算法的第一條規則。

01:48

Algorithms算法 are opinions意見 embedded嵌入式 in code碼.

35

96859

3180

演算法是被嵌入程式中的意見。

01:53

It's really different不同 from what you think
most最 people think of algorithms算法.

36

101562

3663

這和你認為大部份人
對演算法的看法很不一樣。

01:57

They think algorithms算法 are objective目的
and true真正 and scientific科學.

37

105249

4504

他們認為演算法是
客觀的、真實的、科學的。

02:02

That's a marketing營銷 trick招.

38

110387

1699

那是種行銷技倆。

02:05

It's also也 a marketing營銷 trick招

39

113269

2125

還有一種行銷技倆是

02:07

to intimidate威嚇 you with algorithms算法,

40

115418

3154

用演算法來威脅你，

02:10

to make you trust相信 and fear恐懼 algorithms算法

41

118596

3661

讓你相信並懼怕演算法，

02:14

because you trust相信 and fear恐懼 mathematics數學.

42

122281

2018

因為你相信並懼怕數學。

02:17

A lot can go wrong錯誤 when we put
blind盲 faith信仰 in big大 data數據.

43

125567

4830

當我們盲目相信大數據時，
很多地方都可能出錯。

02:23

This is Kiri腹 Soares蘇亞雷斯.
She's a high高 school學校 principal主要 in Brooklyn布魯克林.

44

131684

3373

這位是琦莉索瑞斯，
她是布魯克林的高中校長。

02:27

In 2011, she told me
her teachers教師 were being存在 scored進球

45

135081

2586

2011 年，她告訴我，
用來評分她的老師的演算法

02:29

with a complex複雜, secret秘密 algorithm算法

46

137691

2727

是一種複雜的秘密演算法，

02:32

called叫 the "value-added增值 model模型."

47

140442

1489

叫做「加值模型」。

02:34

I told her, "Well, figure數字 out
what the formula式 is, show顯示 it to me.

48

142505

3092

我告訴她：「找出那方程式
是什麼，給我看，

02:37

I'm going to explain說明 it to you."

49

145621

1541

我就會解釋給你聽。」

02:39

She said, "Well, I tried試著
to get the formula式,

50

147186

2141

她說：「嗯，我試過取得方程式了，

02:41

but my Department部 of Education教育 contact聯繫
told me it was math數學

51

149351

2772

但教育部聯絡人告訴我，
那方程式是數學，

02:44

and I wouldn't不會 understand理解 it."

52

152147

1546

我也看不懂的。」

02:47

It gets得到 worse更差.

53

155266

1338

還有更糟的。

02:48

The New新 York紐約 Post崗位 filed提交
a Freedom自由 of Information信息 Act法案 request請求,

54

156628

3530

紐約郵報提出了一項
資訊自由法案的請求，

02:52

got all the teachers'教師' names名
and all their其 scores分數

55

160182

2959

取得有所有老師的名字
以及他們的分數，

02:55

and they published發表 them
as an act法案 of teacher-shaming教師羞辱.

56

163165

2782

郵報把這些都刊出來，
用來羞辱老師。

02:59

When I tried試著 to get the formulas公式,
the source資源 code碼, through通過 the same相同 means手段,

57

167084

3860

當我試著透過同樣的手段
來找出方程式、原始碼，

03:02

I was told I couldn't不能.

58

170968

2149

我被告知我不可能辦到。

03:05

I was denied否認.

59

173141

1236

我被拒絕了。

03:06

I later後來 found發現 out

60

174401

1174

我後來發現，

03:07

that nobody沒有人 in New新 York紐約 City市
had access訪問 to that formula式.

61

175599

2866

紐約市中沒有人能取得那方程式。

03:10

No one understood了解 it.

62

178489

1305

沒有人了解它。

03:13

Then someone有人 really smart聰明
got involved參與, Gary加里 Rubenstein魯賓斯坦.

63

181929

3224

有個很聰明的人介入：
蓋瑞魯賓斯坦。

03:17

He found發現 665 teachers教師
from that New新 York紐約 Post崗位 data數據

64

185177

3621

他發現紐約郵報資料中
有 665 名老師

03:20

that actually其實 had two scores分數.

65

188822

1866

其實有兩個分數。

03:22

That could happen發生 if they were teaching教學

66

190712

1881

如果他們是在教七年級

03:24

seventh第七 grade年級 math數學 and eighth第八 grade年級 math數學.

67

192617

2439

及八年級數學，是有可能發生。

03:27

He decided決定 to plot情節 them.

68

195080

1538

他決定把他們用圖畫出來。

03:28

Each每 dot點 represents代表 a teacher老師.

69

196642

1993

每一個點代表一個老師。

03:31

(Laughter笑聲)

70

199104

2379

（笑聲）

03:33

What is that?

71

201507

1521

那是什麼？

03:35

(Laughter笑聲)

72

203052

1277

（笑聲）

03:36

That should never have been used
for individual個人 assessment評定.

73

204353

3446

那絕對不該被用來做個人評估用。

03:39

It's almost幾乎 a random隨機 number數 generator發電機.

74

207823

1926

它幾乎就是個隨機數產生器。

03:41

(Applause掌聲)

75

209773

2946

（掌聲）

03:44

But it was.

76

212743

1162

但它的確被用了。

03:45

This is Sarah莎拉 WysockiWysocki.

77

213929

1176

這是莎拉薇沙琪，

03:47

She got fired解僱, along沿
with 205 other teachers教師,

78

215129

2175

她和其他 205 名老師都被開除了，

03:49

from the Washington華盛頓, DCDC school學校 district區,

79

217328

2662

都是在華盛頓特區的學區，

03:52

even though雖然 she had great
recommendations建議 from her principal主要

80

220014

2909

即使她有校長及
學童家長的強力推薦，

03:54

and the parents父母 of her kids孩子.

81

222947

1428

還是被開除了。

03:57

I know what a lot
of you guys are thinking思維,

82

225390

2032

我很清楚你們在想什麼，

03:59

especially特別 the data數據 scientists科學家們,
the AIAI experts專家 here.

83

227446

2487

特別是這裡的資料科學家
及人工智慧專家。

你們在想：「我絕對不會寫出
那麼不一致的演算法。」

04:01

You're thinking思維, "Well, I would never make
an algorithm算法 that inconsistent不符."

84

229957

4226

04:06

But algorithms算法 can go wrong錯誤,

85

234853

1683

但演算法是可能出錯的，

04:08

even have deeply深 destructive有害 effects效果
with good intentions意圖.

86

236560

4598

即使出自好意
仍可能產生毀滅性的效應。

04:14

And whereas而 an airplane飛機
that's designed設計 badly慘

87

242531

2379

設計得很糟的飛機墜機，

04:16

crashes崩潰 to the earth地球 and everyone大家 sees看到 it,

88

244934

2001

每個人都會看到；

04:18

an algorithm算法 designed設計 badly慘

89

246959

1850

可是，設計很糟的演算法，

04:22

can go on for a long time,
silently默默 wreaking發洩 havoc浩劫.

90

250245

3865

可以一直運作很長的時間，
靜靜地製造破壞或混亂。

04:27

This is Roger羅傑 Ailes羅傑·艾爾斯.

91

255748

1570

這位是羅傑艾爾斯。

04:29

(Laughter笑聲)

92

257342

2000

（笑聲）

04:32

He founded成立 Fox狐狸 News新聞 in 1996.

93

260524

2388

他在 1996 年成立了 Fox News。

04:35

More than 20 women婦女 complained抱怨
about sexual有性 harassment騷擾.

94

263436

2581

有超過二十位女性投訴性騷擾。

04:38

They said they weren't不 allowed允許
to succeed成功 at Fox狐狸 News新聞.

95

266041

3235

她們說，她們在 Fox News
不被允許成功。

04:41

He was ousted下台 last year年,
but we've我們已經 seen看到 recently最近

96

269300

2520

他去年被攆走了，但我們看到近期

04:43

that the problems問題 have persisted堅持.

97

271844

2670

這個問題仍然存在。

04:47

That begs乞求 the question題:

98

275654

1400

這就帶來一個問題：

04:49

What should Fox狐狸 News新聞 do
to turn轉 over another另一個 leaf葉?

99

277078

2884

Fox News 該做什麼才能改過自新？

04:53

Well, what if they replaced更換
their其 hiring招聘 process處理

100

281245

3041

如果他們把僱用的流程換掉，

04:56

with a machine-learning機器學習 algorithm算法?

101

284310

1654

換成機器學習演算法呢？

04:57

That sounds聲音 good, right?

102

285988

1595

聽起來很好，對嗎？

04:59

Think about it.

103

287607

1300

想想看。

05:00

The data數據, what would the data數據 be?

104

288931

2105

資料，資料會是什麼？

05:03

A reasonable合理 choice選擇 would be the last
21 years年份 of applications應用 to Fox狐狸 News新聞.

105

291060

4947

一個合理的選擇會是 Fox News
過去 21 年間收到的申請。

05:08

Reasonable合理.

106

296031

1502

很合理。

05:09

What about the definition定義 of success成功?

107

297557

1938

成功的定義呢？

05:11

Reasonable合理 choice選擇 would be,

108

299921

1324

合理的選擇會是，

05:13

well, who is successful成功 at Fox狐狸 News新聞?

109

301269

1778

在 Fox News 有誰是成功的？

05:15

I guess猜測 someone有人 who, say,
stayed住 there for four四 years年份

110

303071

3580

我猜是在那邊待了四年、

05:18

and was promoted提拔 at least最小 once一旦.

111

306675

1654

且至少升遷過一次的人。

05:20

Sounds聲音 reasonable合理.

112

308816

1561

聽起來很合理。

05:22

And then the algorithm算法 would be trained熟練.

113

310401

2354

接著，演算法就會被訓練。

05:24

It would be trained熟練 to look for people
to learn學習 what led to success成功,

114

312779

3877

它會被訓練來找人，
尋找什麼導致成功，

05:29

what kind類 of applications應用
historically歷史 led to success成功

115

317219

4318

在過去怎樣的申請書會導致成功，

05:33

by that definition定義.

116

321561

1294

用剛剛的成功定義。

05:36

Now think about what would happen發生

117

324200

1775

想想看會發生什麼事，

05:37

if we applied應用的 that
to a current當前 pool池 of applicants申請人.

118

325999

2555

如果我們把它用到
目前的一堆申請書上。

05:41

It would filter過濾 out women婦女

119

329119

1629

它會把女性過濾掉，

05:43

because they do not look like people
who were successful成功 in the past過去.

120

331663

3930

因為在過去，女性
並不像是會成功的人。

05:51

Algorithms算法 don't make things fair公平

121

339752

2537

如果只是漫不經心、
盲目地運用演算法，

05:54

if you just blithely輕率地,
blindly盲目地 apply應用 algorithms算法.

122

342313

2694

它們並不會讓事情變公平。

05:57

They don't make things fair公平.

123

345031

1482

演算法不會讓事情變公平。

05:58

They repeat重複 our past過去 practices做法,

124

346537

2128

它們會重覆我們過去的做法，

06:00

our patterns模式.

125

348689

1183

我們的模式。

06:01

They automate自動化 the status狀態 quo現狀.

126

349896

1939

它們會把現狀給自動化。

06:04

That would be great
if we had a perfect完善 world世界,

127

352718

2389

如果我們有個完美的
世界，那就很好了，

06:07

but we don't.

128

355905

1312

但世界不完美。

06:09

And I'll add加 that most最 companies公司
don't have embarrassing尷尬 lawsuits訴訟,

129

357241

4102

我還要補充，大部份公司
沒有難堪的訴訟，

06:14

but the data數據 scientists科學家們 in those companies公司

130

362446

2588

但在那些公司中的資料科學家

06:17

are told to follow跟隨 the data數據,

131

365058

2189

被告知要遵從資料，

06:19

to focus焦點 on accuracy準確性.

132

367271

2143

著重正確率。

06:22

Think about what that means手段.

133

370273

1381

想想那意味著什麼。

06:23

Because we all have bias偏壓,
it means手段 they could be codifying編纂 sexism性別歧視

134

371678

4027

因為我們都有偏見，那就意味著，
他們可能會把性別偏見

06:27

or any other kind類 of bigotry偏執.

135

375729

1836

或其他偏執給寫到程式中，

06:31

Thought experiment實驗,

136

379488

1421

來做個思想實驗，

06:32

because I like them:

137

380933

1509

因為我喜歡思想實驗：

06:35

an entirely完全 segregated隔離 society社會 --

138

383574

2975

一個完全種族隔離的社會，

06:40

racially種族 segregated隔離, all towns城市,
all neighborhoods社區

139

388247

3328

所有的城鎮、所有的街坊
都做了種族隔離，

06:43

and where we send發送 the police警察
only to the minority少數民族 neighborhoods社區

140

391599

3037

我們只會針對少數種族
住的街坊派出警力

06:46

to look for crime犯罪.

141

394660

1193

來尋找犯罪。

06:48

The arrest逮捕 data數據 would be very biased偏.

142

396451

2219

逮捕的資料會非常偏頗。

06:51

What if, on top最佳 of that,
we found發現 the data數據 scientists科學家們

143

399851

2575

如果再加上，我們
找到了資料科學家，

06:54

and paid支付 the data數據 scientists科學家們 to predict預測
where the next下一個 crime犯罪 would occur發生?

144

402450

4161

付錢給他們，要他們預測下次
犯罪會發生在哪裡，會如何？

06:59

Minority少數民族 neighborhood鄰里.

145

407275

1487

答案：少數種族的街坊。

07:01

Or to predict預測 who the next下一個
criminal刑事 would be?

146

409285

3125

或是去預測下一位犯人會是誰？

07:04

A minority少數民族.

147

412888

1395

答案：少數族裔。

07:07

The data數據 scientists科學家們 would brag吹牛
about how great and how accurate準確

148

415949

3541

資料科學家會吹噓他們的的模型

07:11

their其 model模型 would be,

149

419514

1297

有多了不起、多精準，

07:12

and they'd他們會 be right.

150

420835

1299

他們是對的。

07:15

Now, reality現實 isn't that drastic激烈,
but we do have severe嚴重 segregations偏析

151

423951

4615

現實沒那麼極端，但在許多
城鎮和城市中，我們的確有

07:20

in many許多 cities城市 and towns城市,

152

428590

1287

嚴重的種族隔離，

07:21

and we have plenty豐富 of evidence證據

153

429901

1893

我們有很多證據可證明

07:23

of biased偏 policing治安
and justice正義 system系統 data數據.

154

431818

2688

執法和司法資料是偏頗的。

07:27

And we actually其實 do predict預測 hotspots熱點,

155

435632

2815

我們確實預測了熱點，

07:30

places地方 where crimes犯罪 will occur發生.

156

438471

1530

犯罪會發生的地方。

07:32

And we do predict預測, in fact事實,
the individual個人 criminality犯罪,

157

440401

3866

事實上，我們確實預測了
個別的犯罪行為，

07:36

the criminality犯罪 of individuals個人.

158

444291

1770

個人的犯罪行為。

07:38

The news新聞 organization組織 ProPublicaProPublica
recently最近 looked看著 into

159

446972

3963

新聞組織 ProPublica 近期調查了

07:42

one of those "recidivism累犯 risk風險" algorithms算法,

160

450959

2024

「累犯風險」演算法之一，

07:45

as they're called叫,

161

453007

1163

他們是這麼稱呼它的，

07:46

being存在 used in Florida佛羅里達
during中 sentencing宣判 by judges法官.

162

454194

3194

演算法被用在佛羅里達，
法官在判刑時使用。

07:50

Bernard伯納德, on the left, the black黑色 man,
was scored進球 a 10 out of 10.

163

458411

3585

左邊的黑人是伯納，
總分十分，他得了十分。

07:55

Dylan迪倫, on the right, 3 out of 10.

164

463179

2007

右邊的狄倫，十分只得了三分。

07:57

10 out of 10, high高 risk風險.
3 out of 10, low低 risk風險.

165

465210

2501

十分就得十分，高風險。
十分只得三分，低風險。

08:00

They were both都 brought帶 in
for drug藥物 possession所有權.

166

468598

2385

他們都因為持有藥品而被逮捕。

08:03

They both都 had records記錄,

167

471007

1154

他們都有犯罪記錄，

08:04

but Dylan迪倫 had a felony重罪

168

472185

2806

但狄倫犯過重罪，

08:07

but Bernard伯納德 didn't.

169

475015

1176

伯納則沒有。

08:09

This matters事項, because
the higher更高 score得分了 you are,

170

477818

3066

這很重要，因為你的得分越高，

08:12

the more likely容易 you're being存在 given特定
a longer長 sentence句子.

171

480908

3473

你就越可能被判比較長的徒刑。

08:18

What's going on?

172

486294

1294

發生了什麼事？

08:20

Data數據 laundering洗錢.

173

488526

1332

洗資料。

08:22

It's a process處理 by which哪一個
technologists技術專家 hide隱藏 ugly醜陋 truths真理

174

490930

4427

它是個流程，即技術專家
用黑箱作業的演算法

08:27

inside內 black黑色 box框 algorithms算法

175

495381

1821

來隱藏醜陋的真相，

08:29

and call them objective目的;

176

497226

1290

還宣稱是客觀的；

08:31

call them meritocratic精英.

177

499320

1568

是精英領導的。

08:35

When they're secret秘密,
important重要 and destructive有害,

178

503118

2385

我為這些秘密、重要、

又有毀滅性的演算法取了個名字：

08:37

I've coined創造 a term術語 for these algorithms算法:

179

505527

2487

08:40

"weapons武器 of math數學 destruction毀壞."

180

508038

1999

「毀滅性的數學武器」。

08:42

(Laughter笑聲)

181

510061

1564

（笑聲）

08:43

(Applause掌聲)

182

511649

3054

（掌聲）

08:46

They're everywhere到處,
and it's not a mistake錯誤.

183

514727

2354

它們無所不在，且不是個過失。

08:49

These are private私人的 companies公司
building建造 private私人的 algorithms算法

184

517695

3723

私人公司建立私人演算法，

08:53

for private私人的 ends結束.

185

521442

1392

來達到私人的目的。

08:55

Even the ones那些 I talked談 about
for teachers教師 and the public上市 police警察,

186

523214

3214

即使是我剛談到
對老師和警方用的演算法，

08:58

those were built內置 by private私人的 companies公司

187

526452

1869

也是由私人公司建立的，

09:00

and sold出售 to the government政府 institutions機構.

188

528345

2231

然後再銷售給政府機關。

09:02

They call it their其 "secret秘密 sauce醬" --

189

530600

1873

他們稱它為「秘方醬料」，

09:04

that's why they can't tell us about it.

190

532497

2128

所以不能跟我們討論它。

09:06

It's also也 private私人的 power功率.

191

534649

2220

它也是種私人的權力。

09:09

They are profiting獲利 for wielding揮舞
the authority權威 of the inscrutable高深莫測.

192

537924

4695

他們透過行使別人
無法理解的權威來獲利。

09:17

Now you might威力 think,
since以來 all this stuff東東 is private私人的

193

545114

2934

你可能會認為，
所有這些都是私人的，

09:20

and there's competition競爭,

194

548072

1158

且有競爭存在，

09:21

maybe the free自由 market市場
will solve解決 this problem問題.

195

549254

2306

也許自由市場會解決這個問題。

09:23

It won't慣於.

196

551584

1249

並不會。

09:24

There's a lot of money錢
to be made製作 in unfairness不平.

197

552857

3120

從不公平中可以賺取很多錢。

09:29

Also也, we're not economic經濟 rational合理的 agents代理.

198

557127

3369

且，我們不是經濟合法代理人。

09:33

We all are biased偏.

199

561031

1292

我們都有偏見。

09:34

We're all racist種族主義者 and bigoted拘泥
in ways方法 that we wish希望 we weren't不,

200

562960

3377

我們都是種族主義的、偏執的，
即使我們也希望不要這樣，

09:38

in ways方法 that we don't even know.

201

566361

2019

我們甚至不知道我們是這樣的。

09:41

We know this, though雖然, in aggregate骨料,

202

569352

3081

不過我們確實知道，總的來說，

09:44

because sociologists社會學家
have consistently始終如一 demonstrated證明 this

203

572457

3220

因為社會學家不斷地用
他們建立的實驗

09:47

with these experiments實驗 they build建立,

204

575701

1665

來展現出這一點，

09:49

where they send發送 a bunch束
of applications應用 to jobs工作 out,

205

577390

2568

他們寄出一大堆的工作申請書，

09:51

equally一樣 qualified合格 but some
have white-sounding白冠冕堂皇 names名

206

579982

2501

都有同樣的資格，
但有些用白人人名，

09:54

and some have black-sounding黑冠冕堂皇 names名,

207

582507

1706

有些用黑人人名，

09:56

and it's always disappointing令人失望,
the results結果 -- always.

208

584237

2694

結果總是讓人失望的，總是如此。

09:59

So we are the ones那些 that are biased偏,

209

587510

1771

所以，我們才是有偏見的人，

10:01

and we are injecting注射 those biases偏見
into the algorithms算法

210

589305

3429

且我們把這些偏見注入演算法中，

10:04

by choosing選擇 what data數據 to collect蒐集,

211

592758

1812

做法是選擇要收集哪些資料、

10:06

like I chose選擇 not to think
about ramen拉麵 noodles麵條 --

212

594594

2743

比如我選擇不要考量拉麵，

10:09

I decided決定 it was irrelevant不相干.

213

597361

1625

我決定它不重要。

10:11

But by trusting信任的 the data數據 that's actually其實
picking選擇 up on past過去 practices做法

214

599010

5684

但透過相信這些資料
真的能了解過去的做法，

10:16

and by choosing選擇 the definition定義 of success成功,

215

604718

2014

以及透過選擇成功的定義，

10:18

how can we expect期望 the algorithms算法
to emerge出現 unscathed毫髮無損?

216

606756

3983

我們如何能冀望產生的演算法未受損？

10:22

We can't. We have to check查 them.

217

610763

2356

不能。我們得要檢查這些演算法。

10:26

We have to check查 them for fairness公平.

218

614165

1709

我們得要檢查它們是否公平。

10:27

The good news新聞 is,
we can check查 them for fairness公平.

219

615898

2711

好消息是，我們可以
檢查它們是否公平。

10:30

Algorithms算法 can be interrogated審問,

220

618633

3352

演算法可以被審問，

10:34

and they will tell us
the truth真相 every一切 time.

221

622009

2034

且它們每次都會告訴我們真相。

10:36

And we can fix固定 them.
We can make them better.

222

624067

2493

我們可以修正它們，
我們可以把它們變更好。

10:38

I call this an algorithmic算法 audit審計,

223

626584

2375

我稱這個為演算法稽核，

10:40

and I'll walk步行 you through通過 it.

224

628983

1679

我會帶大家來了解它。

10:42

First, data數據 integrity廉正 check查.

225

630686

2196

首先，檢查資料完整性。

10:46

For the recidivism累犯 risk風險
algorithm算法 I talked談 about,

226

634132

2657

針對我先前說的累犯風險演算法，

10:49

a data數據 integrity廉正 check查 would mean
we'd星期三 have to come to terms條款 with the fact事實

227

637582

3573

檢查資料完整性就意味著
我們得接受事實，

10:53

that in the US, whites白人 and blacks黑人
smoke抽煙 pot鍋 at the same相同 rate率

228

641179

3526

事實是，在美國，白人和黑人
抽大麻的比率是一樣的，

10:56

but blacks黑人 are far遠 more likely容易
to be arrested被捕 --

229

644729

2485

但黑人被逮捕的機率遠高於白人，

10:59

four四 or five五 times時 more likely容易,
depending根據 on the area區.

230

647238

3184

四、五倍高的可能性被捕，
依地區而異。

11:03

What is that bias偏壓 looking like
in other crime犯罪 categories類別,

231

651317

2826

在其他犯罪類別中，
那樣的偏見會如何呈現？

11:06

and how do we account帳戶 for it?

232

654167

1451

我們要如何處理它？

11:08

Second第二, we should think about
the definition定義 of success成功,

233

656162

3039

第二，我們要想想成功的定義，

11:11

audit審計 that.

234

659225

1381

去稽核它。

11:12

Remember記得 -- with the hiring招聘
algorithm算法? We talked談 about it.

235

660630

2752

記得我們剛剛談過的僱用演算法嗎？

11:15

Someone有人 who stays入住 for four四 years年份
and is promoted提拔 once一旦?

236

663406

3165

待了四年且升遷至少一次？

11:18

Well, that is a successful成功 employee僱員,

237

666595

1769

那就是個成功員工，

11:20

but it's also也 an employee僱員
that is supported支持的 by their其 culture文化.

238

668388

3079

但那也是個被其文化所支持的員工。

11:24

That said, also也 it can be quite相當 biased偏.

239

672089

1926

儘管如此，它也可能很有偏見。

11:26

We need to separate分離 those two things.

240

674039

2065

我們得把這兩件事分開。

11:28

We should look to
the blind盲 orchestra樂隊 audition面試

241

676128

2426

我們應該要把交響樂團的盲眼甄選

11:30

as an example例.

242

678578

1196

當作參考範例。

11:31

That's where the people auditioning試鏡
are behind背後 a sheet片.

243

679798

2756

他們的做法是讓試演奏的人
在布幕後演奏。

11:34

What I want to think about there

244

682946

1931

我想探討的重點是

11:36

is the people who are listening聽
have decided決定 what's important重要

245

684901

3417

那些在聽並且決定什麼重要的人，

11:40

and they've他們已經 decided決定 what's not important重要,

246

688342

2029

他們也會決定什麼不重要，

11:42

and they're not getting得到
distracted分心 by that.

247

690395

2059

他們不會被不重要的部份給分心。

11:44

When the blind盲 orchestra樂隊
auditions試鏡 started開始,

248

692961

2749

當交響樂團開始採用盲眼甄選，

11:47

the number數 of women婦女 in orchestras樂團
went去 up by a factor因子 of five五.

249

695734

3444

團內的女性成員數上升五倍。

11:52

Next下一個, we have to consider考慮 accuracy準確性.

250

700253

2015

接著，我們要考量正確率。

11:55

This is where the value-added增值 model模型
for teachers教師 would fail失敗 immediately立即.

251

703233

3734

這就是老師的加值模型
立刻會出問題的地方。

11:59

No algorithm算法 is perfect完善, of course課程,

252

707578

2162

當然，沒有演算法是完美的，

12:02

so we have to consider考慮
the errors錯誤 of every一切 algorithm算法.

253

710620

3605

所以我們得要考量
每個演算法的錯誤。

12:06

How often經常 are there errors錯誤,
and for whom誰 does this model模型 fail失敗?

254

714836

4359

多常會出現錯誤、這個模型
針對哪些人會發生錯誤？

12:11

What is the cost成本 of that failure失敗?

255

719850

1718

發生錯誤的成本多高？

12:14

And finally最後, we have to consider考慮

256

722434

2207

最後，我們得要考量

12:17

the long-term長期 effects效果 of algorithms算法,

257

725973

2186

演算法的長期效應，

12:20

the feedback反饋 loops循環 that are engendering從社會性別角度.

258

728866

2207

也就是產生出來的反饋迴圈。

12:23

That sounds聲音 abstract抽象,

259

731586

1236

那聽起來很抽象，

12:24

but imagine想像 if FacebookFacebook的 engineers工程師
had considered考慮 that

260

732846

2664

但想像一下，如果臉書的工程師

12:28

before they decided決定 to show顯示 us
only things that our friends朋友 had posted發布.

261

736270

4855

決定只讓我們看到朋友的貼文
之前就先考量那一點。

12:33

I have two more messages消息,
one for the data數據 scientists科學家們 out there.

262

741761

3234

我還有兩個訊息要傳遞，
其一是給資料科學家的。

12:37

Data數據 scientists科學家們: we should
not be the arbiters仲裁者 of truth真相.

263

745450

3409

資料科學家，我們
不應該是真相的仲裁者，

12:41

We should be translators譯者
of ethical合乎道德的 discussions討論 that happen發生

264

749520

3783

我們應該是翻譯者，

翻譯大社會中發生的每個道德討論。

12:45

in larger大 society社會.

265

753327

1294

12:47

(Applause掌聲)

266

755579

2133

（掌聲）

12:49

And the rest休息 of you,

267

757736

1556

至於你們其他人，

12:52

the non-data非數據 scientists科學家們:

268

760011

1396

不是資料科學家的人：

12:53

this is not a math數學 test測試.

269

761431

1498

這不是個數學考試。

12:55

This is a political政治 fight鬥爭.

270

763632

1348

這是場政治鬥爭。

12:58

We need to demand需求 accountability問責
for our algorithmic算法 overlords霸主.

271

766587

3907

我們得要求為演算法的超載負責。

13:04

(Applause掌聲)

272

772118

1499

（掌聲）

13:05

The era時代 of blind盲 faith信仰
in big大 data數據 must必須 end結束.

273

773641

4225

盲目信仰大數據的時代必須要結束。

13:09

Thank you very much.

274

777890

1167

非常謝謝。

13:11

(Applause掌聲)

275

779081

5303

（掌聲）

Translated by Lilian Chiu
Reviewed by Nan-Kun Wu

ABOUT THE SPEAKER

Cathy O'Neil - Mathematician, data scientist
Data skeptic Cathy O’Neil uncovers the dark secrets of big data, showing how our "objective" algorithms could in fact reinforce human bias.

Why you should listen

In 2008, as a hedge-fund quant, mathematician Cathy O’Neil saw firsthand how really really bad math could lead to financial disaster. Disillusioned, O’Neil became a data scientist and eventually joined Occupy Wall Street’s Alternative Banking Group.

With her popular blog mathbabe.org, O’Neil emerged as an investigative journalist. Her acclaimed book Weapons of Math Destruction details how opaque, black-box algorithms rely on biased historical data to do everything from sentence defendants to hire workers. In 2017, O’Neil founded consulting firm ORCAA to audit algorithms for racial, gender and economic inequality.

More profile about the speaker
Cathy O'Neil | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

凱西歐尼爾: 盲目信仰大數據的時代必須要結束 | TED Talk | TED.com