ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

TED2017

Joseph Redmon: How computers learn to recognize objects instantly

Joseph Redmon: Cara komputer belajar mengenali objek secara langsung

Filmed: 2017-04-24

Readability: 4.5

2,471,805 views

Sepuluh tahun lalu, peneliti menganggap bahwa memerintahkan komputer membedakan kucing dan anjing adalah hal yang hampir mustahil. Sekarang, sistem penglihatan komputer melakukannya dengan akurasi lebih dari 99 persen. Bagaimana bisa? Joseph Redmon mengerjakan sistem YOLO (You Only Look Once/Anda hanya melihat sekali), sebuah metode sumber terbuka deteksi objek yang dapat mengidentifikasi objek dalam gambar dan video -- mulai dari zebra hingga rambu tanda berhenti -- secepat kilat. Melalui demo langsung yang luar biasa, Redmon memamerkan kemajuan penting ini untuk aplikasi seperti mobil otonom (tanpa supir), robotika, dan bahkan deteksi kanker.

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time. Full bio

Double-click the English transcript below to play the video.

00:12

TenSepuluh yearstahun agolalu,

0

825

1151

Sepuluh tahun lalu,

00:14

computerkomputer visionpenglihatan researcherspeneliti
thought that gettingmendapatkan a computerkomputer

1

2000

2776

peneliti penglihatan komputer
merasa bahwa memerintahkan

00:16

to tell the differenceperbedaan
betweenantara a catkucing and a doganjing

2

4800

2696

komputer membedakan kucing dan anjing

00:19

would be almosthampir impossiblemustahil,

3

7520

1976

hampir dikatakan mustahil,

00:21

even with the significantpenting advancemuka
in the statenegara of artificialbuatan intelligenceintelijen.

4

9520

3696

bahkan dengan kemajuan signifikan
dalam bidang kecerdasan buatan.

00:25

Now we can do it at a leveltingkat
greaterlebih besar than 99 percentpersen accuracyketepatan.

5

13240

3560

Sekarang, kita dapat melakukannya
dengan akurasi lebih dari 99 persen.

00:29

This is calledbernama imagegambar classificationklasifikasi --

6

17680

1856

Inilah yang disebut klasifikasi gambar,

00:31

give it an imagegambar,
put a labellabel to that imagegambar --

7

19560

3096

taruhlah satu gambar,
beri label gambar itu,

00:34

and computerskomputer know
thousandsribuan of other categorieskategori as well.

8

22680

3040

dan komputer juga akan tahu
ribuan kategori lainnya.

00:38

I'm a graduatelulus studentmahasiswa
at the UniversityUniversitas of WashingtonWashington,

9

26680

2896

Saya mahasiswa pascasarjana
di University of Washington,

00:41

and I work on a projectproyek calledbernama DarknetDarknet,

10

29600

1896

dan sedang mengerjakan projek Darknet,

00:43

whichyang is a neuralsaraf networkjaringan frameworkkerangka

11

31520

1696

yaitu kerangka kerja jaringan saraf

00:45

for traininglatihan and testingpengujian
computerkomputer visionpenglihatan modelsmodel.

12

33240

2816

untuk melatih dan menguji
model penglihatan komputer.

00:48

So let's just see what DarknetDarknet thinksberpikir

13

36080

2976

Mari kita lihat apa yang Darknet pikirkan

00:51

of this imagegambar that we have.

14

39080

1760

mengenai gambar ini.

00:54

When we runmenjalankan our classifierclassifier

15

42520

2336

Saat kita menjalankan alat klasifikasi

00:56

on this imagegambar,

16

44880

1216

pada gambar ini,

00:58

we see we don't just get
a predictionramalan of doganjing or catkucing,

17

46120

2456

kita melihat tidak hanya
prediksi anjing dan kucing,

01:00

we actuallysebenarnya get
specificspesifik breedberkembang biak predictionsPrediksi.

18

48600

2336

tetapi juga prediksi turunannya
secara spesifik.

01:02

That's the leveltingkat
of granularitygranularity we have now.

19

50960

2176

Kita juga mendapat
prediksi lebih detail lagi.

01:05

And it's correctbenar.

20

53160

1616

Yap, itu benar.

01:06

My doganjing is in factfakta a malamutemalamute.

21

54800

1840

Anjing saya memang
dari jenis malamute.

01:09

So we'vekita sudah madeterbuat amazingmenakjubkan strideslangkah-langkah
in imagegambar classificationklasifikasi,

22

57040

4336

Kita sudah membuat langkah luar biasa
dalam klasifikasi gambar,

01:13

but what happensterjadi
when we runmenjalankan our classifierclassifier

23

61400

2000

bagaimana jika alat klasifikasi dijalankan

01:15

on an imagegambar that looksterlihat like this?

24

63424

1960

pada gambar seperti ini?

01:19

Well ...

25

67080

1200

Mari kita lihat...

01:24

We see that the classifierclassifier comesdatang back
with a prettycantik similarserupa predictionramalan.

26

72640

3896

Alat klasifikasi memberikan
prediksi yang lumayan mirip.

01:28

And it's correctbenar,
there is a malamutemalamute in the imagegambar,

27

76560

3096

Yap, itu benar, ada seekor malamute
dalam gambar,

01:31

but just givendiberikan this labellabel,
we don't actuallysebenarnya know that much

28

79680

3696

tapi jika hanya dari labelnya,
kita tidak tahu banyak tentang

01:35

about what's going on in the imagegambar.

29

83400

1667

apa yang terjadi dalam gambar itu.

01:37

We need something more powerfulkuat.

30

85091

1560

Kita butuh lebih dari itu.

01:39

I work on a problemmasalah
calledbernama objectobyek detectiondeteksi,

31

87240

2616

Saya memikirkan satu persoalan
disebut deteksi objek,

01:41

where we look at an imagegambar
and try to find all of the objectsbenda,

32

89880

2936

yaitu kita melihat gambar dan
mencoba mencari semua objek,

01:44

put boundingberlari boxeskotak around them

33

92840

1456

membuat kotak pembatas,

01:46

and say what those objectsbenda are.

34

94320

1520

dan melabeli semua objek itu.

01:48

So here'sini what happensterjadi
when we runmenjalankan a detectordetektor on this imagegambar.

35

96400

3280

Jadi, seperti inilah jika detektor
dijalankan pada gambar.

01:53

Now, with this kindjenis of resulthasil,

36

101240

2256

Dengan hasil seperti ini, banyak yang bisa

01:55

we can do a lot more
with our computerkomputer visionpenglihatan algorithmsalgoritma.

37

103520

2696

dilakukan dengan algoritme
penglihatan komputer.

01:58

We see that it knowstahu
that there's a catkucing and a doganjing.

38

106240

2976

Bisa kita lihat bahwa komputer
tahu ada kucing dan anjing,

02:01

It knowstahu theirmereka relativerelatif locationslokasi,

39

109240

2256

tahu lokasi relatif dan juga

02:03

theirmereka sizeukuran.

40

111520

1216

ukuran hewan-hewan itu.

02:04

It maymungkin even know some extratambahan informationinformasi.

41

112760

1936

Bahkan ia juga tahu informasi lainya.

02:06

There's a bookBook sittingduduk in the backgroundLatar Belakang.

42

114720

1960

Ada buku di belakang sana.

02:09

And if you want to buildmembangun a systemsistem
on toppuncak of computerkomputer visionpenglihatan,

43

117280

3256

Jika Anda ingin membuat sistem
berkekuatan penglihatan komputer,

02:12

say a self-drivingdiri mengemudi vehiclekendaraan
or a roboticrobot systemsistem,

44

120560

3456

misalnya mobil otonom
atau sistem robotika,

02:16

this is the kindjenis
of informationinformasi that you want.

45

124040

2456

inilah jenis informasi
yang Anda inginkan.

02:18

You want something so that
you can interactberinteraksi with the physicalfisik worlddunia.

46

126520

3239

Ada tentu butuh sesuatu agar
dapat berinteraksi dengan dunia fisik.

02:22

Now, when I starteddimulai workingkerja
on objectobyek detectiondeteksi,

47

130759

2257

Ketika mulai mengerjakan deteksi objek,

02:25

it tookmengambil 20 secondsdetik
to processproses a singletunggal imagegambar.

48

133040

3296

butuh waktu 20 detik untuk
memproses satu gambar.

02:28

And to get a feel for why
speedkecepatan is so importantpenting in this domaindomain,

49

136360

3880

Agar Anda memahami alasan betapa kecepatan
sangat penting dalam domain ini,

02:33

here'sini an examplecontoh of an objectobyek detectordetektor

50

141120

2536

inilah contoh pendeteksi objek

02:35

that takes two secondsdetik
to processproses an imagegambar.

51

143680

2416

yang butuh waktu 2 detik
untuk memproses 1 gambar.

02:38

So this is 10 timeswaktu fasterlebih cepat

52

146120

2616

Proses ini 10 kali lebih cepat

02:40

than the 20-seconds-per-image-detik-per-gambar detectordetektor,

53

148760

3536

daripada alat deteksi 20-detik-per-gambar,

02:44

and you can see that by the time
it makesmembuat predictionsPrediksi,

54

152320

2656

dan dapat Anda lihat bahwa selagi
komputer memprediksi,

02:47

the entireseluruh statenegara of the worlddunia has changedberubah,

55

155000

2040

keadaan lingkungan sekitarnya berubah,

02:49

and this wouldn'ttidak akan be very usefulberguna

56

157880

2416

tentu hal ini tidak akan berguna

02:52

for an applicationaplikasi.

57

160320

1416

bagi aplikasi.

02:53

If we speedkecepatan this up
by anotherlain factorfaktor of 10,

58

161760

2496

Jika kita tingkatkan kecepatan
hingga 10 kali lipat,

02:56

this is a detectordetektor runningberlari
at fivelima framesbingkai perper secondkedua.

59

164280

2816

pendeteksi ini berjalan dengan
lima bingkai per detik.

02:59

This is a lot better,

60

167120

1536

Dan menjadi jauh lebih baik,

03:00

but for examplecontoh,

61

168680

1976

tetapi seandainya,

03:02

if there's any significantpenting movementgerakan,

62

170680

2296

ada pergerakan yang signifikan,

03:05

I wouldn'ttidak akan want a systemsistem
like this drivingmenyetir my carmobil.

63

173000

2560

saya tidak ingin sistem ini
mengemudikan mobil saya.

03:09

This is our detectiondeteksi systemsistem
runningberlari in realnyata time on my laptoplaptop.

64

177120

3240

Sistem deteksi kita ini beroperasi di
laptop dalam waktu nyata.

03:13

So it smoothlylancar trackstrek me
as I movepindah around the framebingkai,

65

181000

3136

Dengan mulus ia melacak selagi
saya bergerak di sekitar bingkai,

03:16

and it's robustkuat to a widelebar varietyvariasi
of changesperubahan in sizeukuran,

66

184160

3720

dan cekatan dalam mendeteksi
berbagai perubahan ukuran,

03:21

posepose,

67

189440

1200

pose,

03:23

forwardmeneruskan, backwardmundur.

68

191280

1856

ke depan, ke belakang.

03:25

This is great.

69

193160

1216

Luar biasa.

03:26

This is what we really need

70

194400

1736

Ini yang sangat kita butuhkan

03:28

if we're going to buildmembangun systemssistem
on toppuncak of computerkomputer visionpenglihatan.

71

196160

2896

jika akan membuat sistem berkekuatan
penglihatan komputer.

03:31

(ApplauseTepuk tangan)

72

199080

4000

(Tepuk tangan)

03:36

So in just a fewbeberapa yearstahun,

73

204280

2176

Hanya dalam beberapa tahun,

03:38

we'vekita sudah gonepergi from 20 secondsdetik perper imagegambar

74

206480

2656

ada kemajuan dari 20 detik per gambar

03:41

to 20 millisecondsmilidetik perper imagegambar,
a thousandribu timeswaktu fasterlebih cepat.

75

209160

3536

menjadi 20 milidetik per gambar,
seribu kali lebih cepat.

03:44

How did we get there?

76

212720

1416

Bagaimana bisa demikian?

03:46

Well, in the pastlalu,
objectobyek detectiondeteksi systemssistem

77

214160

3016

Dulu, sistem deteksi objek

03:49

would take an imagegambar like this

78

217200

1936

menggunakan gambar seperti ini

03:51

and splitmembagi it into a bunchbanyak of regionsdaerah

79

219160

2456

dan membaginya menjadi
sekelompok area

03:53

and then runmenjalankan a classifierclassifier
on eachsetiap of these regionsdaerah,

80

221640

3256

lalu menjalankan alat klasifikasi
pada masing-masing area,

03:56

and hightinggi scoresskor for that classifierclassifier

81

224920

2536

dan skor tinggi dari alat klasifikasi

03:59

would be considereddianggap
detectionspendeteksian in the imagegambar.

82

227480

3136

dianggap sebagai deteksi dalam gambar.

04:02

But this involvedterlibat runningberlari a classifierclassifier
thousandsribuan of timeswaktu over an imagegambar,

83

230640

4056

Tetapi metode ini mengharuskan ribuan kali
deteksi pada satu gambar,

04:06

thousandsribuan of neuralsaraf networkjaringan evaluationsEvaluasi
to producemenghasilkan detectiondeteksi.

84

234720

2920

ribuan evaluasi kerangka saraf
untuk menghasilkan deteksi.

04:11

InsteadSebaliknya, we trainedterlatih a singletunggal networkjaringan
to do all of detectiondeteksi for us.

85

239240

4536

Alih-alih, kami melatih satu jaringan
untuk melakukan semua deteksi.

04:15

It producesmenghasilkan all of the boundingberlari boxeskotak
and classkelas probabilitiesprobabilitas simultaneouslyserentak.

86

243800

4280

Jaringan itu memunculkan kotak pembatas
sekaligus probabilitas kelas.

04:20

With our systemsistem, insteadsebagai gantinya of looking
at an imagegambar thousandsribuan of timeswaktu

87

248680

3496

Dengan sistem ini, alih-alih melihat
satu gambar ribuan kali

04:24

to producemenghasilkan detectiondeteksi,

88

252200

1456

untuk menghasilkan deteksi,

04:25

you only look oncesekali,

89

253680

1256

Anda cukup lihat sekali,

04:26

and that's why we call it
the YOLOYOLO methodmetode of objectobyek detectiondeteksi.

90

254960

2920

oleh karena itulah kami menyebutnya
metode deteksi objek YOLO.

04:31

So with this speedkecepatan,
we're not just limitedterbatas to imagesgambar;

91

259360

3976

Dengan kecepatan seperti ini,
kita dapat memproses tidak hanya gambar,

04:35

we can processproses videovideo in realnyata time.

92

263360

2416

tetapi juga video dalam waktu nyata.

04:37

And now, insteadsebagai gantinya of just seeingmelihat
that catkucing and doganjing,

93

265800

3096

Sehingga, alih-alih hanya
melihat kucing dan anjing,

04:40

we can see them movepindah around
and interactberinteraksi with eachsetiap other.

94

268920

2960

kita juga dapat melihat hewan ini
bergerak dan berinteraksi.

04:46

This is a detectordetektor that we trainedterlatih

95

274560

2056

Inilah pendeteksi yang kami latih

04:48

on 80 differentberbeda classeskelas

96

276640

4376

pada 80 kelas berbeda

04:53

in Microsoft'sMicrosoft COCOCOCO datasetdataset.

97

281040

3256

dalam dataset COCO milik Microsoft.

04:56

It has all sortsmacam of things
like spoonsendok and forkgarpu, bowlmangkuk,

98

284320

3336

Dataset ini memiliki semua jenis benda,
sendok dan garpu, mangkuk

04:59

commonumum objectsbenda like that.

99

287680

1800

benda-benda umum semacam itu.

05:02

It has a varietyvariasi of more exoticeksotis things:

100

290360

3096

Juga ada beragam benda eksotik lainnya:

05:05

animalshewan, carsmobil, zebrasZebra, giraffesjerapah.

101

293480

3256

binatang, mobil, zebra, jerapah.

05:08

And now we're going to do something funmenyenangkan.

102

296760

1936

Mari kita lakukan sesuatu yang menarik.

05:10

We're just going to go
out into the audiencehadirin

103

298720

2096

Saya akan mengarahkan kamera ke penonton

05:12

and see what kindjenis of things we can detectmendeteksi.

104

300840

2016

dan lihatlah benda yang dapat terdeteksi.

05:14

Does anyonesiapa saja want a stuffedboneka animalhewan?

105

302880

1620

Ada yang mau boneka hewan?

05:18

There are some teddyTeddy bearsberuang out there.

106

306000

1762

Ada beberapa boneka beruang di sana.

05:22

And we can turnbelok down
our thresholdambang for detectiondeteksi a little bitsedikit,

107

310040

4536

Kita dapat menurunkan
ambang pendeteksinya sedikit,

05:26

so we can find more of you guys
out in the audiencehadirin.

108

314600

3400

agar ia dapat mendeteksi
lebih banyak penonton.

05:31

Let's see if we can get these stop signstanda-tanda.

109

319560

2336

Ada rambu berhenti yang terdeteksi.

05:33

We find some backpacksransel.

110

321920

1880

Ada tas ransel.

05:37

Let's just zoomzoom in a little bitsedikit.

111

325880

1840

Mari kita perbesar sedikit.

05:42

And this is great.

112

330320

1256

Luar biasa.

05:43

And all of the processingpengolahan
is happeningkejadian in realnyata time

113

331600

3176

Semua proses ini terjadi saat ini juga

05:46

on the laptoplaptop.

114

334800

1200

dengan laptop.

05:49

And it's importantpenting to rememberingat

115

337080

1456

Penting untuk diingat bahwa

05:50

that this is a generalumum purposetujuan
objectobyek detectiondeteksi systemsistem,

116

338560

3216

inilah tujuan umum
sistem deteksi objek,

05:53

so we can trainmelatih this for any imagegambar domaindomain.

117

341800

5000

agar kami dapat melatihnya
pada domain gambar mana pun.

06:00

The samesama codekode that we use

118

348320

2536

Kode yang sama yang kita pakai untuk

06:02

to find stop signstanda-tanda or pedestrianspejalan kaki,

119

350880

2456

menemukan tanda berhenti
atau pejalan kaki,

06:05

bicyclessepeda in a self-drivingdiri mengemudi vehiclekendaraan,

120

353360

1976

sepeda dan mobil otonom,

06:07

can be used to find cancerkanker cellssel

121

355360

2856

yang dapat dipakai untuk
menemukan sel kanker

06:10

in a tissuetisu biopsybiopsi.

122

358240

3016

dalam biopsi jaringan.

06:13

And there are researcherspeneliti around the globedunia
alreadysudah usingmenggunakan this technologyteknologi

123

361280

4040

Ada banyak peneliti di seluruh dunia
yang sudah menggunakan teknologi ini

06:18

for advancesuang muka in things
like medicineobat, roboticsRobotika.

124

366240

3416

untuk pengembangan dalam
obat-obatan, robotika.

06:21

This morningpagi, I readBaca baca a paperkertas

125

369680

1376

Tadi pagi saya membaca koran

06:23

where they were takingpengambilan a censussensus
of animalshewan in NairobiNairobi NationalNasional ParkPark

126

371080

4576

bahwa ada sensus binatang
di Taman Nasional Nairobi

06:27

with YOLOYOLO as partbagian
of this detectiondeteksi systemsistem.

127

375680

3136

menggunakan YOLO sebagai
bagian dari sistem deteksi ini.

06:30

And that's because DarknetDarknet is openBuka sourcesumber

128

378840

3096

Itu karena Darknet adalah sumber terbuka

06:33

and in the publicpublik domaindomain,
freebebas for anyonesiapa saja to use.

129

381960

2520

ada di domain publik,
gratis untuk siapa saja.

06:37

(ApplauseTepuk tangan)

130

385600

5696

(Tepuk tangan)

06:43

But we wanted to make detectiondeteksi
even more accessibledapat diakses and usabledapat digunakan,

131

391320

4936

Tapi kami ingin agar teknologi ini
lebih mudah diperoleh dan berguna,

06:48

so throughmelalui a combinationkombinasi
of modelmodel optimizationoptimasi,

132

396280

4056

jadi melalui kombinasi
pengoptimalan model,

06:52

networkjaringan binarizationbinarization and approximationpendekatan,

133

400360

2296

binarisasi dan pendekatan jaringan,

06:54

we actuallysebenarnya have objectobyek detectiondeteksi
runningberlari on a phonetelepon.

134

402680

3920

kita punya deteksi obyek yang
berjalan dalam ponsel.

07:04

(ApplauseTepuk tangan)

135

412800

5320

(Tepuk tangan)

07:10

And I'm really excitedgembira because
now we have a prettycantik powerfulkuat solutionlarutan

136

418960

5056

Dan saya sangat senang karena
sekarang ada solusi yang cukup kuat

07:16

to this low-leveltingkat rendah computerkomputer visionpenglihatan problemmasalah,

137

424040

2296

atas masalah penglihatan
komputer level rendah,

07:18

and anyonesiapa saja can take it
and buildmembangun something with it.

138

426360

3856

dan siapa pun boleh mengambil dan
membuat sesuatu dengan memakainya.

07:22

So now the restberistirahat is up to all of you

139

430240

3176

Selebihnya terserah Anda
dan orang-orang

07:25

and people around the worlddunia
with accessmengakses to this softwareperangkat lunak,

140

433440

2936

di seluruh dunia yang mengakses
perangkat lunak ini,

07:28

and I can't wait to see what people
will buildmembangun with this technologyteknologi.

141

436400

3656

saya tidak sabar ingin melihat apa yang
mereka buat dengan teknologi ini.

07:32

Thank you.

142

440080

1216

Terima kasih.

07:33

(ApplauseTepuk tangan)

143

441320

3440

(Tepuk tangan)

Translated by Ivana Setiadi
Reviewed by Rifkul Uswati

ABOUT THE SPEAKER

Joseph Redmon - Computer scientist
Joseph Redmon works on the YOLO algorithm, which combines the simple face detection of your phone camera with a cloud-based AI -- in real time.

Why you should listen

Computer scientist Joseph Redmon is working on the YOLO (You Only Look Once) algorithm, which has a simple goal: to deliver image recognition and object detection at a speed that would seem science-fictional only a few years ago. The algorithm looks like the simple face detection of a camera app but with the level complexity of systems like Google's Deep Mind Cloud Vision, using Convolutional Deep Neural Networks to crunch object detection in realtime. It's the kind of technology that will be embedded on all smartphones in the next few years.

Redmon is also internet-famous for his resume.

More profile about the speaker
Joseph Redmon | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

Joseph Redmon: Cara komputer belajar mengenali objek secara langsung | TED Talk | TED.com