ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

TED2017

Joseph Redmon: How computers learn to recognize objects instantly

喬瑟夫．瑞德蒙: 電腦是如何學習即時辨識物體的？

Filmed: 2017-04-24

Readability: 4.5

2,471,805 views

10 年前，研究人員認為要讓電腦辨別出貓、狗，那根本是不可能的。今日，電腦視覺系統已經可以做到 99% 以上的辨識度。這是怎麼辦到的？喬瑟夫．瑞德蒙致力於YOLO（你只要看一眼）的開放原始碼物體辨識方法，它已經可以像閃電般的速度，辨識出圖片及影片中的物件——從斑馬到停止標誌。在這場令人驚嘆的演示中，瑞德蒙向我們展示了這項技術重要的進步里程碑，像是在自動駕駛車、機器人、甚至是癌症檢測上的應用。

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time. Full bio

Double-click the English transcript below to play the video.

00:12

Ten十 years年份 ago前,

0

825

1151

10 年前，

00:14

computer電腦 vision視力 researchers研究人員
thought that getting得到 a computer電腦

1

2000

2776

電腦視覺研究人員認為，

00:16

to tell the difference區別
between之間 a cat貓 and a dog狗

2

4800

2696

要讓電腦辨別貓與狗的差別，

00:19

would be almost幾乎 impossible不可能,

3

7520

1976

幾乎是比登天還難，

00:21

even with the significant重大 advance提前
in the state州 of artificial人造 intelligence情報.

4

9520

3696

即使用了相當先進的
人工智慧都很難辦到。

00:25

Now we can do it at a level水平
greater更大 than 99 percent百分 accuracy準確性.

5

13240

3560

現在我們可以把辨別的準確度
提升到 99% 以上。

00:29

This is called叫 image圖片 classification分類 --

6

17680

1856

這技術叫做圖像分類——

00:31

give it an image圖片,
put a label標籤 to that image圖片 --

7

19560

3096

給電腦看圖片，
並給圖片貼上標籤——

00:34

and computers電腦 know
thousands數千 of other categories類別 as well.

8

22680

3040

電腦還可以識別出
許多其它類別的東西。

00:38

I'm a graduate畢業 student學生
at the University大學 of Washington華盛頓,

9

26680

2896

我目前是華盛頓大學的研究生，

00:41

and I work on a project項目 called叫 Darknet暗網,

10

29600

1896

我正在做一個專題叫做「暗黑網路」，

00:43

which哪一個 is a neural神經 network網絡 framework骨架

11

31520

1696

它是一個用來訓練及測試

00:45

for training訓練 and testing測試
computer電腦 vision視力 models楷模.

12

33240

2816

電腦視覺模型的神經網路架構。

00:48

So let's just see what Darknet暗網 thinks想

13

36080

2976

所以，讓我們來瞧瞧暗黑網路

00:51

of this image圖片 that we have.

14

39080

1760

對我們照片識別能力的狀況。

00:54

When we run跑 our classifier分類

15

42520

2336

當我們在這張照片上

00:56

on this image圖片,

16

44880

1216

開啟我們的分類器，

00:58

we see we don't just get
a prediction預測 of dog狗 or cat貓,

17

46120

2456

可以看到電腦現在不只
在預測這是狗或貓，

01:00

we actually其實 get
specific具體 breed品種 predictions預測.

18

48600

2336

它實際上正在擷取特定品種的預測。

01:02

That's the level水平
of granularity粒度 we have now.

19

50960

2176

這就是現在我們電腦的粒度等級。

01:05

And it's correct正確.

20

53160

1616

辨別正確。

01:06

My dog狗 is in fact事實 a malamute雪橇犬.

21

54800

1840

我的狗的確是隻雪橇犬。

01:09

So we've我們已經 made製作 amazing驚人 strides進步
in image圖片 classification分類,

22

57040

4336

所以，我們在圖像識別上
已經有了很大的進步，

01:13

but what happens發生
when we run跑 our classifier分類

23

61400

2000

但如果我們用識別器

01:15

on an image圖片 that looks容貌 like this?

24

63424

1960

來辨別這樣的照片呢？

01:19

Well ...

25

67080

1200

嗯……

01:24

We see that the classifier分類 comes來 back
with a pretty漂亮 similar類似 prediction預測.

26

72640

3896

可以看到從分類器
得到的預測也相當類似。

01:28

And it's correct正確,
there is a malamute雪橇犬 in the image圖片,

27

76560

3096

沒錯，圖片中有一隻雪橇狗，

01:31

but just given特定 this label標籤,
we don't actually其實 know that much

28

79680

3696

但它只給出一個標籤，

我們對這張照片的理解
還不是很完整。

01:35

about what's going on in the image圖片.

29

83400

1667

01:37

We need something more powerful強大.

30

85091

1560

我們需要更強的東西。

01:39

I work on a problem問題
called叫 object目的 detection發現,

31

87240

2616

我正在研究一個問題，
叫做「物件偵測」，

01:41

where we look at an image圖片
and try to find all of the objects對象,

32

89880

2936

我們把一張照片中的
所有物體都找出來，

01:44

put bounding邊界 boxes盒子 around them

33

92840

1456

用邊界框把它們框起來，

01:46

and say what those objects對象 are.

34

94320

1520

然後標示它們是那些東西。

01:48

So here's這裡的 what happens發生
when we run跑 a detector探測器 on this image圖片.

35

96400

3280

我們來看一下當我們在這一張圖片上
執行偵測軟體時，會發生甚麼事。

01:53

Now, with this kind類 of result結果,

36

101240

2256

現在，有了這類的結果，

01:55

we can do a lot more
with our computer電腦 vision視力 algorithms算法.

37

103520

2696

我們就可以利用電腦視覺演算法，
幫我們做更多的事。

01:58

We see that it knows知道
that there's a cat貓 and a dog狗.

38

106240

2976

我們可以看到，
電腦知道圖片中有一隻貓和狗。

02:01

It knows知道 their其 relative相對的 locations地點,

39

109240

2256

它知道牠們彼此的相對位置、

02:03

their其 size尺寸.

40

111520

1216

大小。

02:04

It may可能 even know some extra額外 information信息.

41

112760

1936

電腦甚至可能知道其它的資訊。

02:06

There's a book書 sitting坐在 in the background背景.

42

114720

1960

它也看到了背景中有一本書。

02:09

And if you want to build建立 a system系統
on top最佳 of computer電腦 vision視力,

43

117280

3256

如果你想要建立一個
基於電腦視覺系統的實用系統，

02:12

say a self-driving自駕車 vehicle車輛
or a robotic機器人 system系統,

44

120560

3456

比如說，自動駕駛車或機械人系統，

02:16

this is the kind類
of information信息 that you want.

45

124040

2456

這類就會是你想要的資訊。

02:18

You want something so that
you can interact相互作用 with the physical物理 world世界.

46

126520

3239

你會想要一個可以
與實體世界互動的東西。

02:22

Now, when I started開始 working加工
on object目的 detection發現,

47

130759

2257

當我開始做物件偵測時，

02:25

it took拿 20 seconds秒
to process處理 a single單 image圖片.

48

133040

3296

它要花 20 秒才能處理一張圖片。

02:28

And to get a feel for why
speed速度 is so important重要 in this domain域,

49

136360

3880

為了讓各位體會
為什麼這個領域這麼講究速度，

02:33

here's這裡的 an example例 of an object目的 detector探測器

50

141120

2536

我這邊做個執行物件偵測器的示範，

02:35

that takes two seconds秒
to process處理 an image圖片.

51

143680

2416

一張照片只要 2 秒的處理時間。

02:38

So this is 10 times時 faster更快

52

146120

2616

所以，比 20 秒一張的偵測器

02:40

than the 20-seconds-per-image-seconds每次圖像 detector探測器,

53

148760

3536

快了 10 倍，

02:44

and you can see that by the time
it makes品牌 predictions預測,

54

152320

2656

各位可以看到，
在它識別圖像的過程中，

02:47

the entire整個 state州 of the world世界 has changed變,

55

155000

2040

周圍環境已經發生了變化，

02:49

and this wouldn't不會 be very useful有用

56

157880

2416

但對一個應用軟體而言，

02:52

for an application應用.

57

160320

1416

這樣的速度是很鷄肋的。

02:53

If we speed速度 this up
by another另一個 factor因子 of 10,

58

161760

2496

如果我們把另一個參數調升到 10 ，

02:56

this is a detector探測器 running賽跑
at five五 frames幀 per每 second第二.

59

164280

2816

這個偵測器每秒
就可以識別 5 張圖片。

02:59

This is a lot better,

60

167120

1536

這樣好多了，

03:00

but for example例,

61

168680

1976

但，假如，

03:02

if there's any significant重大 movement運動,

62

170680

2296

移動很快的時候……

03:05

I wouldn't不會 want a system系統
like this driving主動 my car汽車.

63

173000

2560

我可不想在我車上裝這樣慢的系統。

03:09

This is our detection發現 system系統
running賽跑 in real真實 time on my laptop筆記本電腦.

64

177120

3240

這是在我筆電上運行的
即時偵測系統。

03:13

So it smoothly順利 tracks軌道 me
as I move移動 around the frame幀,

65

181000

3136

我在框框附近移動的時候，
它可以很順暢地追蹤著我，

03:16

and it's robust強大的 to a wide寬 variety品種
of changes變化 in size尺寸,

66

184160

3720

而且，它可以根據不同的大小、

03:21

pose提出,

67

189440

1200

姿勢、

03:23

forward前鋒, backward落後.

68

191280

1856

前、後來做調整。

03:25

This is great.

69

193160

1216

太棒了。

03:26

This is what we really need

70

194400

1736

如果我們要建立一個
基於電腦視覺系統的實用系統，

03:28

if we're going to build建立 systems系統
on top最佳 of computer電腦 vision視力.

71

196160

2896

這個才會是我真正想要的。

03:31

(Applause掌聲)

72

199080

4000

（掌聲）

03:36

So in just a few少數 years年份,

73

204280

2176

所以，才幾年的時間，

03:38

we've我們已經 gone走了 from 20 seconds秒 per每 image圖片

74

206480

2656

我們從每 20 秒處理一張照片，

03:41

to 20 milliseconds毫秒 per每 image圖片,
a thousand千 times時 faster更快.

75

209160

3536

進步到每張照片只要 20 毫秒，
快了 1000 倍。

03:44

How did we get there?

76

212720

1416

我們是如何辦到的？

03:46

Well, in the past過去,
object目的 detection發現 systems系統

77

214160

3016

過去，物件偵測系統，

03:49

would take an image圖片 like this

78

217200

1936

會把一張像這樣的照片，

03:51

and split分裂 it into a bunch束 of regions地區

79

219160

2456

分割成好幾個小區塊，

03:53

and then run跑 a classifier分類
on each每 of these regions地區,

80

221640

3256

然後在每一個小區塊
運行分類器軟體，

03:56

and high高 scores分數 for that classifier分類

81

224920

2536

相似度得分如果比較高

03:59

would be considered考慮
detections檢測 in the image圖片.

82

227480

3136

會被識別器認為照片偵測成功。

04:02

But this involved參與 running賽跑 a classifier分類
thousands數千 of times時 over an image圖片,

83

230640

4056

但這樣一張圖片要執行
好幾千次的識別指令、

04:06

thousands數千 of neural神經 network網絡 evaluations評估
to produce生產 detection發現.

84

234720

2920

經過好幾千次的神經網路評估
才有辦法偵測出來。

04:11

Instead代替, we trained熟練 a single單 network網絡
to do all of detection發現 for us.

85

239240

4536

但我們不是這樣做，我們訓練了一個
網路模型來幫我們完成所有的偵測。

04:15

It produces產生 all of the bounding邊界 boxes盒子
and class類 probabilities概率 simultaneously同時.

86

243800

4280

它可以同時產出邊界框
並同時對可能的結果進行評估。

04:20

With our system系統, instead代替 of looking
at an image圖片 thousands數千 of times時

87

248680

3496

有了我們的系統，
你就不用一張圖片看了好幾千遍

04:24

to produce生產 detection發現,

88

252200

1456

才能偵測出來。

04:25

you only look once一旦,

89

253680

1256

你只要看一眼 (YOLO)，

04:26

and that's why we call it
the YOLOYOLO method方法 of object目的 detection發現.

90

254960

2920

所以我們簡稱這個
物件偵測技術為「YOLO」。

04:31

So with this speed速度,
we're not just limited有限 to images圖片;

91

259360

3976

所以，有了這樣的辨識速度，
我們不只可以偵測圖片；

04:35

we can process處理 video視頻 in real真實 time.

92

263360

2416

還可以處理即時的影片。

04:37

And now, instead代替 of just seeing眼看
that cat貓 and dog狗,

93

265800

3096

現在各位看到的不是
貓、狗的靜態圖片，

04:40

we can see them move移動 around
and interact相互作用 with each每 other.

94

268920

2960

而是有牠們在移動、
互動的動態影片。

04:46

This is a detector探測器 that we trained熟練

95

274560

2056

這是我們用微軟 COCO 資料集裡

04:48

on 80 different不同 classes類

96

276640

4376

80 種不同的類別

04:53

in Microsoft's微軟的 COCOCOCO dataset數據集.

97

281040

3256

訓練出來的辨識器。

04:56

It has all sorts排序 of things
like spoon勺 and fork叉子, bowl碗,

98

284320

3336

它包含各種東西，
像是湯匙、叉子、碗

04:59

common共同 objects對象 like that.

99

287680

1800

這類的日常用品。

05:02

It has a variety品種 of more exotic異國情調 things:

100

290360

3096

它還有很多奇妙的東西：

05:05

animals動物, cars汽車, zebras斑馬, giraffes長頸鹿.

101

293480

3256

動物、車子、斑馬、長頸鹿。

05:08

And now we're going to do something fun開玩笑.

102

296760

1936

現在我們要進行一件好玩的事。

05:10

We're just going to go
out into the audience聽眾

103

298720

2096

我們會進到觀眾席，

05:12

and see what kind類 of things we can detect檢測.

104

300840

2016

去看看能辨識到哪些東西。

05:14

Does anyone任何人 want a stuffed填充的 animal動物?

105

302880

1620

有誰要填充娃娃？

05:18

There are some teddy泰迪熊 bears熊 out there.

106

306000

1762

這邊還有一些泰迪熊。

05:22

And we can turn轉 down
our threshold閾 for detection發現 a little bit位,

107

310040

4536

我們現在降低一下
對偵測結果的精確度的要求，

05:26

so we can find more of you guys
out in the audience聽眾.

108

314600

3400

這樣我們可以在觀眾席中
找到更多東西。

05:31

Let's see if we can get these stop signs跡象.

109

319560

2336

我們來看看能不能偵測到停止標誌。

05:33

We find some backpacks背包.

110

321920

1880

我們有偵測到一些背包。

05:37

Let's just zoom放大 in a little bit位.

111

325880

1840

現在把鏡頭拉近一點。

05:42

And this is great.

112

330320

1256

這真的很厲害。

05:43

And all of the processing處理
is happening事件 in real真實 time

113

331600

3176

所有的偵測流程

都可以在筆電裡即時呈現。

05:46

on the laptop筆記本電腦.

114

334800

1200

05:49

And it's important重要 to remember記得

115

337080

1456

更重要的是，

05:50

that this is a general一般 purpose目的
object目的 detection發現 system系統,

116

338560

3216

這只是一個一般用的物件偵測系統，

05:53

so we can train培養 this for any image圖片 domain域.

117

341800

5000

我們還可以訓練它
辨別任何領域的照片。

06:00

The same相同 code碼 that we use

118

348320

2536

同樣的程式碼，
放在自動駕駛車裡，

06:02

to find stop signs跡象 or pedestrians行人,

119

350880

2456

可以偵測到停止標誌、行人、

06:05

bicycles自行車 in a self-driving自駕車 vehicle車輛,

120

353360

1976

腳踏車，

06:07

can be used to find cancer癌症 cells細胞

121

355360

2856

但放到組織切片

06:10

in a tissue組織 biopsy活檢.

122

358240

3016

就可以偵測出癌症細胞。

06:13

And there are researchers研究人員 around the globe地球
already已經 using運用 this technology技術

123

361280

4040

現在全球有很多研究人員
已經開始在使用這項技術

06:18

for advances進步 in things
like medicine醫學, robotics機器人.

124

366240

3416

做進一步的研究，
像是醫藥、機械人領域。

06:21

This morning早上, I read讀 a paper紙

125

369680

1376

今天早上，我讀到一篇文章，

06:23

where they were taking服用 a census人口調查
of animals動物 in Nairobi內羅畢 National國民 Park公園

126

371080

4576

在奈洛比國家公園裡，
他們要對動物們進行統計調查，

06:27

with YOLOYOLO as part部分
of this detection發現 system系統.

127

375680

3136

YOLO 就是其使用的
偵測系統的一部分。

06:30

And that's because Darknet暗網 is open打開 source資源

128

378840

3096

而這一切都是因為
暗黑網路是開放原始碼，

06:33

and in the public上市 domain域,
free自由 for anyone任何人 to use.

129

381960

2520

在公眾領域，
任何人都可以免費使用。

06:37

(Applause掌聲)

130

385600

5696

（掌聲）

06:43

But we wanted to make detection發現
even more accessible無障礙 and usable可用,

131

391320

4936

但我們希望偵測系統
可以更親民、更好用，

06:48

so through通過 a combination組合
of model模型 optimization優化,

132

396280

4056

所以在經過模型優化、

06:52

network網絡 binarization二值化 and approximation近似,

133

400360

2296

網路二值化及近似度化的整合後，

06:54

we actually其實 have object目的 detection發現
running賽跑 on a phone電話.

134

402680

3920

我們終於可以在手機上偵測物件。

07:04

(Applause掌聲)

135

412800

5320

（掌聲）

07:10

And I'm really excited興奮 because
now we have a pretty漂亮 powerful強大 solution解

136

418960

5056

而我真的相當興奮，因為我們現在

在低階的電腦影像處理問題上
有了相當強力的解決方式，

07:16

to this low-level低級別 computer電腦 vision視力 problem問題,

137

424040

2296

07:18

and anyone任何人 can take it
and build建立 something with it.

138

426360

3856

任何人都可以拿去並創造一些東西。

07:22

So now the rest休息 is up to all of you

139

430240

3176

所以，接下來就看各位

07:25

and people around the world世界
with access訪問 to this software軟件,

140

433440

2936

以及全世界所有人
用這個軟體大展身手了，

07:28

and I can't wait to see what people
will build建立 with this technology技術.

141

436400

3656

我真的等不及想看看你們
用這項科技所做出來的產品。

07:32

Thank you.

142

440080

1216

謝謝。

07:33

(Applause掌聲)

143

441320

3440

（掌聲）

Translated by Yi-Fan Yu
Reviewed by Wilde Luo

ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

喬瑟夫．瑞德蒙: 電腦是如何學習即時辨識物體的？ | TED Talk | TED.com