ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

TED2017

Joseph Redmon: How computers learn to recognize objects instantly

조셉 레드몬(Joseph Redmon): 즉각적 사물 인식을 컴퓨터가 학습하는 과정

Filmed: 2017-04-24

Readability: 4.5

2,471,805 views

10년 전만 해도, 연구원들은 컴퓨터가 고양이와 강아지를 구분해 내는 일은 거의 불가능할 것이라고 생각했습니다. 오늘날, 컴퓨터 시각장치는 99퍼센트 이상의 정확도로 이 일을 해 내고 있습니다. 어떻게 가능할까요? 조셉 레드몬은 빛처럼 빠른 속도로 얼룩말과 정지표시 등의 이미지와 동영상을 식별할 수 있는 오픈소스 사물식별법인 "욜로"시스템을 연구하고 있습니다. 이 놀라운 실시간 시연을 통해, 레드몬은 자율주행 자동차, 로봇공학, 심지어는 암 발견에까지 응용할 수 있는 이 중요한 발전을 자랑스럽게 보여줍니다.

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time. Full bio

Double-click the English transcript below to play the video.

00:12

Ten십 years연령 ago...전에,

0

825

1151

십년 전 만해도

00:14

computer컴퓨터 vision시력 researchers연구원
thought that getting점점 a computer컴퓨터

1

2000

2776

컴퓨터 시각 연구자들은

00:16

to tell the difference차
between중에서 a cat고양이 and a dog개

2

4800

2696

개와 고양이를 컴퓨터가
구별해 내는 것은

00:19

would be almost거의 impossible불가능한,

3

7520

1976

거의 불가능하다 생각했습니다.

00:21

even with the significant중요한 advance전진
in the state상태 of artificial인공의 intelligence지성.

4

9520

3696

아무리 인공지능이
발전해도 말이지요.

00:25

Now we can do it at a level수평
greater더 큰 than 99 percent퍼센트 accuracy정확성.

5

13240

3560

지금은 99% 이상 정확하게
그 일이 가능한데,

00:29

This is called전화 한 image영상 classification분류 --

6

17680

1856

이것을 '이미지 분류' 라고 합니다.

00:31

give it an image영상,
put a label상표 to that image영상 --

7

19560

3096

이미지 마다 이름표를 붙여주면

00:34

and computers컴퓨터들 know
thousands수천 of other categories카테고리 as well.

8

22680

3040

컴퓨터는 수천 개의 다른
유형까지 알아냅니다.

00:38

I'm a graduate졸업하다 student학생
at the University대학 of Washington워싱턴,

9

26680

2896

저는 워싱턴 대학교에서
석사과정을 밟고 있고

00:41

and I work on a project계획 called전화 한 Darknet다크 넷,

10

29600

1896

'다크넷' 이라 불리는
프로젝트를 연구하고 있습니다.

00:43

which어느 is a neural신경 network회로망 framework뼈대

11

31520

1696

일종의 신경망 체제의 프로그램인데

00:45

for training훈련 and testing시험
computer컴퓨터 vision시력 models모델.

12

33240

2816

컴퓨터 시각 견본을
교육하고 실험하는데 쓰입니다.

00:48

So let's just see what Darknet다크 넷 thinks생각해.

13

36080

2976

자 이제,
'다크넷'이 어떤 식으로

00:51

of this image영상 that we have.

14

39080

1760

이 이미지를 인식하는지 보겠습니다.

00:54

When we run운영 our classifier분류기

15

42520

2336

지금 이 이미지에

00:56

on this image영상,

16

44880

1216

저희가 개발한 선별법을 적용하면

00:58

we see we don't just get
a prediction예측 of dog개 or cat고양이,

17

46120

2456

단지 개 또는 고양이의
예측 뿐만 아니라

01:00

we actually사실은 get
specific특유한 breed일으키다 predictions예측.

18

48600

2336

자세한 종까지도 알아 낼 수 있습니다.

01:02

That's the level수평
of granularity세분성 we have now.

19

50960

2176

이미 이 정도로
세밀한 수준에 올라와 있습니다.

01:05

And it's correct옳은.

20

53160

1616

아주 정확하기까지 합니다.

01:06

My dog개 is in fact것 a malamute말라뮤트.

21

54800

1840

제 반려견은 말라뮤트 입니다.

01:09

So we've우리는 made만든 amazing놀랄 만한 strides걸음 걸이
in image영상 classification분류,

22

57040

4336

'이미지 선별법'이 엄청나게
발전을 해왔는데

01:13

but what happens일이
when we run운영 our classifier분류기

23

61400

2000

이런 종류의 이미지에
저희 선별법을 적용시키면

01:15

on an image영상 that looks외모 like this?

24

63424

1960

과연 어떤 결과가 나올까요?

01:19

Well ...

25

67080

1200

자...

01:24

We see that the classifier분류기 comes온다 back
with a pretty예쁜 similar비슷한 prediction예측.

26

72640

3896

대략 비슷한 예측을
하는 것을 볼 수 있습니다.

01:28

And it's correct옳은,
there is a malamute말라뮤트 in the image영상,

27

76560

3096

맞습니다, 사진에 말라뮤트가 있죠.

01:31

but just given주어진 this label상표,
we don't actually사실은 know that much

28

79680

3696

하지만 이 정도로는 어떤 장면인지

01:35

about what's going on in the image영상.

29

83400

1667

많은 것을 알 수 없습니다.

01:37

We need something more powerful강한.

30

85091

1560

좀 더 효과적인 것이 필요하겠지요.

01:39

I work on a problem문제
called전화 한 object목적 detection발각,

31

87240

2616

저는 지금 '사물감지'라 불리는
문제에 대해 연구하고 있습니다.

01:41

where we look at an image영상
and try to find all of the objects사물,

32

89880

2936

한 이미지 안에 있는
모든 사물들을 찾아내서

01:44

put bounding경계 boxes상자들 around them

33

92840

1456

테두리 상자를 치고

01:46

and say what those objects사물 are.

34

94320

1520

그것이 무엇인지 맞추는 것입니다.

01:48

So here's여기에 what happens일이
when we run운영 a detector탐지기 on this image영상.

35

96400

3280

여기에 감지법을 적용하면
어떻게 되는지 보겠습니다.

01:53

Now, with this kind종류 of result결과,

36

101240

2256

자, 이런 식의 결과라면

01:55

we can do a lot more
with our computer컴퓨터 vision시력 algorithms알고리즘.

37

103520

2696

컴퓨터 시각 알고리듬으로
더 많은 것을 해낼 수 있겠군요.

01:58

We see that it knows알고있다
that there's a cat고양이 and a dog개.

38

106240

2976

이제 이미지 안에 고양이와 개가 있고

02:01

It knows알고있다 their그들의 relative상대적인 locations위치들,

39

109240

2256

대략 그들의 위치

02:03

their그들의 size크기.

40

111520

1216

그리고 크기까지 파악하고 있습니다.

02:04

It may할 수있다 even know some extra특별한 information정보.

41

112760

1936

그외 다른 정보들까지
알고 있을지도 모르겠네요.

02:06

There's a book도서 sitting좌석 in the background배경.

42

114720

1960

저 뒤 쪽에 책 한 권이 있네요.

02:09

And if you want to build짓다 a system체계
on top상단 of computer컴퓨터 vision시력,

43

117280

3256

이 컴퓨터 시각을 이용해서
어떤 시스템을 개발한다면,

02:12

say a self-driving자가 운전 vehicle차량
or a robotic로봇 식의 system체계,

44

120560

3456

자율주행 자동차나 로봇 시스템일텐데

02:16

this is the kind종류
of information정보 that you want.

45

124040

2456

바로 이런 것들이
여러분들이 원하는 정보일 겁니다.

02:18

You want something so that
you can interact상호 작용하다 with the physical물리적 인 world세계.

46

126520

3239

물리적 세계와 교감을
가능하게 하는 것들 말이지요.

02:22

Now, when I started시작한 working일
on object목적 detection발각,

47

130759

2257

자, 제가 처음으로
'사물감지' 연구에 들어갔을 때

02:25

it took~했다 20 seconds초
to process방법 a single단일 image영상.

48

133040

3296

이미지 하나를 처리하는데
20초가 걸렸습니다.

02:28

And to get a feel for why
speed속도 is so important중대한 in this domain도메인,

49

136360

3880

이 분야에서 왜 속도가
중요한지 알고 싶다면

02:33

here's여기에 an example예 of an object목적 detector탐지기

50

141120

2536

여기 사물감지기능의 한 예가 있습니다.

02:35

that takes two seconds초
to process방법 an image영상.

51

143680

2416

이미지 하나를 처리하는데
2초 밖에 걸리지 않습니다.

02:38

So this is 10 times타임스 faster더 빠른

52

146120

2616

20초 짜리 감지기능보다는

02:40

than the 20-seconds-per-image초당 이미지 detector탐지기,

53

148760

3536

10배나 빠른 속도이지요.

02:44

and you can see that by the time
it makes~을 만든다 predictions예측,

54

152320

2656

보시는 것 처럼, 이 기능이
예측을 하기 시작할 때면

02:47

the entire완전한 state상태 of the world세계 has changed변경된,

55

155000

2040

이미 벌어지고 상황은
바뀌어 있을 테니까

02:49

and this wouldn't~ 않을거야. be very useful유능한

56

157880

2416

응용 프로그램으로는

02:52

for an application신청.

57

160320

1416

별 효용이 없을 겁니다.

02:53

If we speed속도 this up
by another다른 factor인자 of 10,

58

161760

2496

만일 10배를 더 빠르게 한다면

02:56

this is a detector탐지기 running달리는
at five다섯 frames프레임 per당 second둘째.

59

164280

2816

초당 다섯 장면을 처리하는
감지기능이 됩니다.

02:59

This is a lot better,

60

167120

1536

훨씬 낫죠.

03:00

but for example예,

61

168680

1976

하지만 만일,

03:02

if there's any significant중요한 movement운동,

62

170680

2296

여기서 더 큰 발전이 없다면

03:05

I wouldn't~ 않을거야. want a system체계
like this driving운전 my car차.

63

173000

2560

이 정도의 시스템이 제 차를
운전하기를 원친 않겠지요.

03:09

This is our detection발각 system체계
running달리는 in real레알 time on my laptop휴대용 퍼스널 컴퓨터.

64

177120

3240

이것이 제 노트북에서 실시간으로
작동되고 있는 감지 시스템입니다.

03:13

So it smoothly부드럽게 tracks트랙 me
as I move움직임 around the frame틀,

65

181000

3136

아주 부드럽게 제가 틀안에서
움직이는 대로 따라오죠.

03:16

and it's robust건장한 to a wide넓은 variety종류
of changes변화들 in size크기,

66

184160

3720

아무 문제가 없습니다.
다양한 크기

03:21

pose자세,

67

189440

1200

자세

03:23

forward앞으로, backward뒤로.

68

191280

1856

앞뒤 움직임에도

03:25

This is great.

69

193160

1216

훌륭하죠.

03:26

This is what we really need

70

194400

1736

이런 것이 바로
우리에게 필요한 것입니다.

03:28

if we're going to build짓다 systems시스템
on top상단 of computer컴퓨터 vision시력.

71

196160

2896

컴퓨터 시각을 이용한
시스템을 개발할 때 말이지요.

03:31

(Applause박수 갈채)

72

199080

4000

(박수)

03:36

So in just a few조금 years연령,

73

204280

2176

불과 몇년 만에

03:38

we've우리는 gone지나간 from 20 seconds초 per당 image영상

74

206480

2656

한 이미지를 처리하는 시간이 20초에서

03:41

to 20 milliseconds밀리 초 per당 image영상,
a thousand천 times타임스 faster더 빠른.

75

209160

3536

500분의 1초로,
천배나 빨라졌습니다.

03:44

How did we get there?

76

212720

1416

어떻게 가능했을까요?

03:46

Well, in the past과거,
object목적 detection발각 systems시스템

77

214160

3016

과거에는, 사물감지 시스템들은

03:49

would take an image영상 like this

78

217200

1936

이런 이미지를 가지고

03:51

and split스플릿 it into a bunch다발 of regions지역들

79

219160

2456

여러 영역으로 잘라내서

03:53

and then run운영 a classifier분류기
on each마다 of these regions지역들,

80

221640

3256

각 영역 마다 선별작업을 실행하고

03:56

and high높은 scores점수 for that classifier분류기

81

224920

2536

그 선별작업에서 산출된
가장 높은 점수들이

03:59

would be considered깊이 생각한
detections탐지 in the image영상.

82

227480

3136

이미지의 감지로
간주되는 방식이었습니다.

04:02

But this involved뒤얽힌 running달리는 a classifier분류기
thousands수천 of times타임스 over an image영상,

83

230640

4056

하지만, 감지를 하기까지 한 이미지에
수천 번의 분류작업이

04:06

thousands수천 of neural신경 network회로망 evaluations평가
to produce생기게 하다 detection발각.

84

234720

2920

또 수천 번의 신경망 감정을
거쳐야 했습니다.

04:11

Instead대신, we trained훈련 된 a single단일 network회로망
to do all of detection발각 for us.

85

239240

4536

대신에, 우리는 단일 네트워크로
모든 탐지가 가능케 했습니다.

04:15

It produces생산하다 all of the bounding경계 boxes상자들
and class수업 probabilities확률 simultaneously동시에.

86

243800

4280

모든 테두리 상자와 분류 개연성을
동시에 처리해 내는 것이지요.

04:20

With our system체계, instead대신에 of looking
at an image영상 thousands수천 of times타임스

87

248680

3496

저희 시스템에서는 감지를 해내기 위해

04:24

to produce생기게 하다 detection발각,

88

252200

1456

한 이미지를 수천 번이 아니라

04:25

you only look once일단,

89

253680

1256

단 한 번 보는 것으로 가능하고

04:26

and that's why we call it
the YOLOYOLO method방법 of object목적 detection발각.

90

254960

2920

저희가 이것을 사물감지의 '욜로'법 으로
부르는 이유입니다.

04:31

So with this speed속도,
we're not just limited제한된 to images이미지들;

91

259360

3976

이 속도로는, 이미지 뿐만 아니라

04:35

we can process방법 video비디오 in real레알 time.

92

263360

2416

동영상도 실시간으로
처리할 수 있습니다.

04:37

And now, instead대신에 of just seeing봄
that cat고양이 and dog개,

93

265800

3096

이제는 단순히 개와 고양이를
인지하는 것을 넘어서

04:40

we can see them move움직임 around
and interact상호 작용하다 with each마다 other.

94

268920

2960

그들이 돌아다니는 것도,
서로 어울리는 것도 볼 수 있습니다.

04:46

This is a detector탐지기 that we trained훈련 된

95

274560

2056

이것이 저희가 개발해낸
감지기능입니다.

04:48

on 80 different다른 classes수업

96

276640

4376

마이크로소프트의
코코 데이터 세트 안에서

04:53

in Microsoft's마이크로 소프트 COCO머리 dataset데이터 세트.

97

281040

3256

80개의 등급에 적용시켜
얻어낸 것이지요.

04:56

It has all sorts종류 of things
like spoon숟가락 and fork포크, bowl사발,

98

284320

3336

숟가락, 포크, 그릇 같이
평범한 물건들이

04:59

common공유지 objects사물 like that.

99

287680

1800

다양하게 있네요.

05:02

It has a variety종류 of more exotic이국적인 things:

100

290360

3096

좀 특이한 것들도 보이지요.

05:05

animals동물, cars자동차, zebras얼룩말, giraffes기린.

101

293480

3256

동물, 자동차, 얼룩말, 기린.

05:08

And now we're going to do something fun장난.

102

296760

1936

재미난 걸 한번 해볼까요.

05:10

We're just going to go
out into the audience청중

103

298720

2096

방청석으로 들어가서

05:12

and see what kind종류 of things we can detect탐지하다.

104

300840

2016

어떤 물건들이 감지되는지 보겠습니다.

05:14

Does anyone누군가 want a stuffed채워진 것 animal동물?

105

302880

1620

동물인형 갖고 싶으신 분?

05:18

There are some teddy테디 bears곰 out there.

106

306000

1762

저기 곰인형도 몇개 있네요.

05:22

And we can turn회전 down
our threshold문지방 for detection발각 a little bit비트,

107

310040

4536

감지한계치를 조금 낮추면,

05:26

so we can find more of you guys
out in the audience청중.

108

314600

3400

더 많은 분들이 화면에 잡히겠지요.

05:31

Let's see if we can get these stop signs표지판.

109

319560

2336

이 정지표지판들도
잡아낼 수 있는지 보겠습니다.

05:33

We find some backpacks배낭.

110

321920

1880

배낭도 몇개 보이네요.

05:37

Let's just zoom줌 in a little bit비트.

111

325880

1840

조금 가까이 당겨 보지요.

05:42

And this is great.

112

330320

1256

좋습니다.

05:43

And all of the processing가공
is happening사고 in real레알 time

113

331600

3176

이 모든 것이 컴퓨터에서 실시간으로

05:46

on the laptop휴대용 퍼스널 컴퓨터.

114

334800

1200

처리되고 있습니다.

05:49

And it's important중대한 to remember생각해 내다

115

337080

1456

꼭 알아둘 것은

05:50

that this is a general일반 purpose목적
object목적 detection발각 system체계,

116

338560

3216

이것이 총괄적인
사물감지 시스템이란 것입니다.

05:53

so we can train기차 this for any image영상 domain도메인.

117

341800

5000

그래야 어떠한 이미지 종류에도
적용시킬 수 있겠지요.

06:00

The same같은 code암호 that we use

118

348320

2536

동일한 코드가

06:02

to find stop signs표지판 or pedestrians보행자,

119

350880

2456

정지표지판 또는 보행자

06:05

bicycles자전거 in a self-driving자가 운전 vehicle차량,

120

353360

1976

자율주행 자동차 안의 자전거들을
찾아내기도 하고

06:07

can be used to find cancer암 cells세포들

121

355360

2856

조직검사를 통해 암세포를

06:10

in a tissue조직 biopsy생검.

122

358240

3016

찾아낼 때도 사용될 수 있습니다.

06:13

And there are researchers연구원 around the globe지구
already이미 using~을 사용하여 this technology과학 기술

123

361280

4040

이미 세계 곳곳의 연구원들이 이 기술을

06:18

for advances발전하다 in things
like medicine의학, robotics로봇 공학.

124

366240

3416

의학과 로봇공학의 발전 등에
쓰고 있습니다.

06:21

This morning아침, I read독서 a paper종이

125

369680

1376

오늘 아침 신문에

06:23

where they were taking취득 a census인구 조사
of animals동물 in Nairobi나이로비 National내셔널 Park공원

126

371080

4576

나이로비 국립공원의 동물 수 조사에

06:27

with YOLOYOLO as part부품
of this detection발각 system체계.

127

375680

3136

욜로가 감지 시스템의 일부로
사용된다고 나왔더군요.

06:30

And that's because Darknet다크 넷 is open열다 source출처

128

378840

3096

다크넷이 오픈소스이기도 하고

06:33

and in the public공공의 domain도메인,
free비어 있는 for anyone누군가 to use.

129

381960

2520

모두가 무료로 사용할 수
있도록 열려있기 때문입니다.

06:37

(Applause박수 갈채)

130

385600

5696

(박수)

06:43

But we wanted to make detection발각
even more accessible얻기 쉬운 and usable쓸 수 있는,

131

391320

4936

그런데, 저희는 감지기능의
접근성과 사용성을 더 높이고 싶었고

06:48

so through...을 통하여 a combination콤비네이션
of model모델 optimization최적화,

132

396280

4056

견본 최적화

06:52

network회로망 binarization2 치화 and approximation근사,

133

400360

2296

네트워크 이진화와 근사치의
적절한 조화를 통해서

06:54

we actually사실은 have object목적 detection발각
running달리는 on a phone전화.

134

402680

3920

이제 휴대전화에서도
사물감지가 가능하게 했습니다.

07:04

(Applause박수 갈채)

135

412800

5320

(박수)

07:10

And I'm really excited흥분한 because
now we have a pretty예쁜 powerful강한 solution해결책

136

418960

5056

아주 흥분되는데요. 왜냐면
급이 낮은 컴퓨터 시각 문제점들을

07:16

to this low-level저급 computer컴퓨터 vision시력 problem문제,

137

424040

2296

해결할 아주 효과적인
방법이 있으니까요.

07:18

and anyone누군가 can take it
and build짓다 something with it.

138

426360

3856

누구나 이 기술을 가지고
원하는 것들을 만들어 낼 수 있습니다.

07:22

So now the rest휴식 is up to all of you

139

430240

3176

이제 나머지는
여러분들의 몫이고요.

07:25

and people around the world세계
with access접속하다 to this software소프트웨어,

140

433440

2936

또 이 소프트웨어를 사용하는
세상의 모든 분들의 몫입니다.

07:28

and I can't wait to see what people
will build짓다 with this technology과학 기술.

141

436400

3656

이 기술로 사람들이 어떤 것들을
만들어 낼지 너무 기대됩니다.

07:32

Thank you.

142

440080

1216

감사합니다.

(박수)

07:33

(Applause박수 갈채)

143

441320

3440

Translated by 혜련 장
Reviewed by Taz B K

ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

조셉 레드몬(Joseph Redmon): 즉각적 사물 인식을 컴퓨터가 학습하는 과정 | TED Talk | TED.com