ABOUT THE SPEAKERS

Jean-Baptiste Michel - Data researcher
Jean-Baptiste Michel looks at how we can use large volumes of data to better understand our world.

Why you should listen

Jean-Baptiste Michel holds joint academic appointments at Harvard (FQEB Fellow) and Google (Visiting Faculty). His research focusses on using large volumes of data as tools that help better understand the world around us -- from the way diseases progress in patients over years, to the way cultures change in human societies over centuries. With his colleague Erez Lieberman Aiden, Jean-Baptiste is a Founding Director of Harvard's Cultural Observatory, where their research team pioneers the use of quantitative methods for the study of human culture, language and history. His research was featured on the covers of Science and Nature, on the front pages of the New York Times and the Boston Globe, in The Economist, Wired and many other venues. The online tool he helped create -- ngrams.googlelabs.com -- was used millions of times to browse cultural trends. Jean-Baptiste is an Engineer from Ecole Polytechnique (Paris), and holds an MS in Applied Mathematics and a PhD in Systems Biology from Harvard.

More profile about the speaker
Jean-Baptiste Michel | Speaker | TED.com

Erez Lieberman Aiden - Researcher
Erez Lieberman Aiden pursues a broad range of research interests, spanning genomics, linguistics, mathematics ...

Why you should listen

Erez Lieberman Aiden is a fellow at the Harvard Society of Fellows and Visiting Faculty at Google. His research spans many disciplines and has won numerous awards, including recognition for one of the top 20 "Biotech Breakthroughs that will Change Medicine", by Popular Mechanics; the Lemelson-MIT prize for the best student inventor at MIT; the American Physical Society's Award for the Best Doctoral Dissertation in Biological Physics; and membership in Technology Review's 2009 TR35, recognizing the top 35 innovators under 35. His last three papers -- two with JB Michel -- have all appeared on the cover of Nature and Science.

More profile about the speaker
Erez Lieberman Aiden | Speaker | TED.com

TEDxBoston 2011

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

5百万冊の本から学んだこと

Filmed: 2011-07-24

Readability: 3.9

2,049,453 views

Google LabsのNgram Viewerをいじってみたことはありますか？何世紀にも渡って書かれてきた5百万という本のデータベースの中から言葉やアイデアを探せるやみつきになるツールです。エレズ・リーバーマン・エイデンとジャン・バプティスト・ミシェルがその仕組みと、5千億語のデータが教えてくれるちょっと驚くようなことを見せてくれます。

Jean-Baptiste Michel - Data researcher
Jean-Baptiste Michel looks at how we can use large volumes of data to better understand our world. Full bioErez Lieberman Aiden - Researcher
Erez Lieberman Aiden pursues a broad range of research interests, spanning genomics, linguistics, mathematics ... Full bio

Double-click the English transcript below to play the video.

00:15

Erezエレス Liebermanリーバーマン Aidenアイデン: Everyoneみんな knows知っている

0

0

2000

(エレズ) ご存じと思いますが

00:17

that a picture画像 is worth価値 a thousand千 words言葉.

1

2000

3000

１枚の絵は千の言葉に値すると言います

00:22

But we at Harvardハーバード

2

7000

2000

しかしハーバード大学では

00:24

were wondering不思議 if this was really true真実.

3

9000

3000

この点について疑問を抱きました

00:27

(Laughter笑い)

4

12000

2000

(笑)

00:29

So we assembled組み立てられた a teamチーム of experts専門家,

5

14000

4000

それで専門家のチームが編成されました

00:33

spanningスパニング Harvardハーバード, MITMIT,

6

18000

2000

ハーバード大学 MIT

00:35

The Americanアメリカ人 Heritage遺産 Dictionary辞書, The Encyclopedia百科事典 Britannicaブリタニカ

7

20000

3000

アメリカン・ヘリテージ英語辞典ブリタニカ百科事典

00:38

and even our proud誇りに思う sponsorsスポンサー,

8

23000

2000

それに我らがスポンサー

00:40

the GoogleGoogle.

9

25000

3000

Googleも参加しています

00:43

And we cogitated配偶者 about this

10

28000

2000

そして４年間に渡って

00:45

for about four4つの years年.

11

30000

2000

詳細な研究が続けられ

00:47

And we came来た to a startling驚くべき conclusion結論.

12

32000

5000

驚くべき結論が得られました

00:52

Ladiesレディース and gentlemen紳士, a picture画像 is not worth価値 a thousand千 words言葉.

13

37000

3000

皆さん１枚の絵は千の言葉に値するのではありません

00:55

In fact事実, we found見つけた some picturesピクチャー

14

40000

2000

我々の発見によれば

00:57

that are worth価値 500 billion億 words言葉.

15

42000

5000

１枚の絵は5千億の言葉に値するのです

01:02

Jean-Baptisteジャン=バティスト Michelミシェル: So how did we get to this conclusion結論?

16

47000

2000

(ジャン) いかにしてその結論に至ったのか？

01:04

So Erezエレス and I were thinking考え about ways方法

17

49000

2000

エレズと私は人類の文化と歴史が

01:06

to get a big大きい picture画像 of human人間 culture文化

18

51000

2000

時とともにどう遷移してきたのか

01:08

and human人間 history歴史: change変化する over time.

19

53000

3000

概観できる方法に考えを巡らせていました

01:11

So manyたくさんの books本 actually実際に have been written書かれた over the years年.

20

56000

2000

長年に渡り多くの本が書かれています

01:13

So we were thinking考え, well the bestベスト way to learn学ぶ from them

21

58000

2000

それらの本をすべて読むのが

01:15

is to read読む all of these millions何百万 of books本.

22

60000

2000

最良の方法だろうと考えました

01:17

Now of courseコース, if there's a scale規模 for how awesome驚くばかり that is,

23

62000

3000

もし「いかしてる」度合いを測る単位があったとしたら

01:20

that has to rankランク extremely極端な, extremely極端な high高い.

24

65000

3000

これは非常に高い値になるでしょう

01:23

Now the problem問題 is there's an X-axisX軸 for that,

25

68000

2000

問題は X軸に

01:25

whichどの is the practical実用的な axis軸.

26

70000

2000

実現性を取ると

01:27

This is very, very low低い.

27

72000

2000

それがごく低くなるということです

01:29

(Applause拍手)

28

74000

3000

(拍手)

01:32

Now people tend傾向がある to use an alternative代替 approachアプローチ,

29

77000

3000

それで多くの人は違ったアプローチを取っています

01:35

whichどの is to take a few少数 sourcesソース and read読む them very carefully慎重に.

30

80000

2000

一握りの文献を熟読するのです

01:37

This is extremely極端な practical実用的な, but not so awesome驚くばかり.

31

82000

2000

現実的ですがそんなにいかしてはいません

01:39

What you really want to do

32

84000

3000

本当にやりたいのは

01:42

is to get to the awesome驚くばかり yetまだ practical実用的な part部 of this spaceスペース.

33

87000

3000

いかしていながら現実的なことです

01:45

So it turnsターン out there was a company会社 across横断する the river川 calledと呼ばれる GoogleGoogle

34

90000

3000

川向こうのGoogleという会社がそれを可能にするような

01:48

who had started開始した a digitizationデジタル化 projectプロジェクト a few少数 years年 back

35

93000

2000

デジタル化プロジェクトを

01:50

that mightかもしれない just enable有効にする this approachアプローチ.

36

95000

2000

数年前からやっていると聞き及びました

01:52

They have digitizedデジタル化された millions何百万 of books本.

37

97000

2000

何百万という本がデジタル化され

01:54

So what that means手段 is, one could use computational計算上の methodsメソッド

38

99000

3000

それらの本をボタンひとつで

01:57

to read読む all of the books本 in a clickクリック of a buttonボタン.

39

102000

2000

コンピュータに読み取らせることができます

01:59

That's very practical実用的な and extremely極端な awesome驚くばかり.

40

104000

3000

これはとても現実的でありながらすごくいかしています

02:03

ELAELA: Let me tell you a little bitビット about where books本 come from.

41

108000

2000

(エレズ) 本の由来についてお話ししましょう

02:05

Since以来 time immemorial大昔の, there have been authors著者.

42

110000

3000

大昔から本を書く人々がいて

02:08

These authors著者 have been striving努力する to write書きます books本.

43

113000

3000

著者たちは苦労して本を書いていました

02:11

And this becameなりました considerablyかなり easierより簡単に

44

116000

2000

数世紀前の印刷術の発明により

02:13

with the development開発 of the printing印刷 press押す some centuries世紀 ago前.

45

118000

2000

それが格段に容易になりました

02:15

Since以来 then, the authors著者 have won勝った

46

120000

3000

それ以来行われてきた出版の機会というのは

02:18

on 129 million百万 distinct明確な occasions機会,

47

123000

2000

1億2千9百万回にも

02:20

publishing出版 books本.

48

125000

2000

及びます

02:22

Now if those books本 are not lost失われた to history歴史,

49

127000

2000

それらの本は失われていなければ

02:24

then they are somewhereどこかで in a libraryとしょうかん,

50

129000

2000

どこかの図書館にあります

02:26

and manyたくさんの of those books本 have been getting取得 retrieved検索された from the libraries図書館

51

131000

3000

その多くがGoogleにより図書館から借り出され

02:29

and digitizedデジタル化された by GoogleGoogle,

52

134000

2000

デジタルデータ化されました

02:31

whichどの has scannedスキャンした 15 million百万 books本 to date日付.

53

136000

2000

既に千5百万冊がスキャンされています

02:33

Now when GoogleGoogle digitizesデジタル化する a book本, they put it into a really niceいい formatフォーマット.

54

138000

3000

Googleはデジタル化された本を有用な形式で保存します

02:36

Now we've私たちは got the dataデータ, plusプラス we have metadataメタデータ.

55

141000

2000

データだけでなくメタデータも手に入ります

02:38

We have information情報 about things like where was it published出版された,

56

143000

3000

どこで出版されたのか誰が書いたのか

02:41

who was the author著者, when was it published出版された.

57

146000

2000

いつ発行されたのか

02:43

And what we do is go throughを通して all of those records記録

58

148000

3000

私たちがしたのはそれらすべてのレコードをチェックして

02:46

and exclude排除する everything that's not the highest最高 quality品質 dataデータ.

59

151000

4000

クオリティが最高のもの以外除外するということです

02:50

What we're left with

60

155000

2000

残ったのは

02:52

is a collectionコレクション of five五 million百万 books本,

61

157000

3000

5百万冊の本

02:55

500 billion億 words言葉,

62

160000

3000

5千億語というデータです

02:58

a string文字列 of characters文字 a thousand千 times回 longerより長いです

63

163000

2000

ヒトゲノムよりも

03:00

than the human人間 genomeゲノム --

64

165000

3000

千倍も長い文字列

03:03

a textテキスト whichどの, when written書かれた out,

65

168000

2000

書き出したなら

03:05

would stretchストレッチ from here to the Moon月 and back

66

170000

2000

地球と月の間を10回以上

03:07

10 times回 over --

67

172000

2000

往復する—

03:09

a veritable誠実な shardシャード of our cultural文化的 genomeゲノム.

68

174000

4000

紛れもない我々の文化ゲノムのかけらです

03:13

Of courseコース what we did

69

178000

2000

そのような

03:15

when faced直面する with suchそのような outrageous恐ろしい hyperbole誇張 ...

70

180000

3000

誇大広告に直面して･･･

03:18

(Laughter笑い)

71

183000

2000

(笑)

03:20

was what any self-respecting自己尊重 researchers研究者

72

185000

3000

私たちがしたのはもちろん

03:23

would have done完了.

73

188000

3000

自尊心ある研究者なら誰でもするであろうことです

03:26

We took取った a pageページ out of XKCDXKCD,

74

191000

2000

XKCDの漫画の1ページを

03:28

and we said, "Standスタンド back.

75

193000

2000

引用して言ったのです

03:30

We're going to try science科学."

76

195000

2000

「下がれ我は科学するものなり」

03:32

(Laughter笑い)

77

197000

2000

(笑)

03:34

JMJM: Now of courseコース, we were thinking考え,

78

199000

2000

(ジャン) 私たちが考えたのは

03:36

well let's just first put the dataデータ out there

79

201000

2000

まずデータをみんなに公開して

03:38

for people to do science科学 to it.

80

203000

2000

それで科学できるようにしようということです

03:40

Now we're thinking考え, what dataデータ can we release解放?

81

205000

2000

どんなデータが公開できるでしょう？

03:42

Well of courseコース, you want to take the books本

82

207000

2000

もちろん5百万冊の本の

03:44

and release解放 the full満員 textテキスト of these five五 million百万 books本.

83

209000

2000

全文を公開したいと思いました

03:46

Now GoogleGoogle, and Jonジョン OrwantOrwant in particular特に,

84

211000

2000

でもGoogleのジョン・オーワントが

03:48

told us a little equation方程式 that we should learn学ぶ.

85

213000

2000

ちょっとした方程式を教えてくれました

03:50

So you have five五 million百万, that is, five五 million百万 authors著者

86

215000

3000

5百万冊の本 = 5百万人の著者 =

03:53

and five五 million百万 plaintiffs原告 is a massive大規模 lawsuit訴訟.

87

218000

3000

5百万の原告からなる巨大な訴訟

03:56

So, althoughただし、 that would be really, really awesome驚くばかり,

88

221000

2000

全文公開は

03:58

again, that's extremely極端な, extremely極端な impractical実用的でない.

89

223000

3000

ものすごくいかしているにしても極めて非現実的なのです

04:01

(Laughter笑い)

90

226000

2000

(笑)

04:03

Now again, we kind種類 of caved虐待された in,

91

228000

2000

それで再び折れて

04:05

and we did the very practical実用的な approachアプローチ, whichどの was a bitビット lessもっと少なく awesome驚くばかり.

92

230000

3000

いかしている度合いを下げて現実的なアプローチを取り

04:08

We said, well instead代わりに of releasing解放する the full満員 textテキスト,

93

233000

2000

全文の代わりに本の統計データを

04:10

we're going to release解放 statistics統計 about the books本.

94

235000

2000

公開することにしたのです

04:12

So take for instanceインスタンス "A gleam輝く of happiness幸福."

95

237000

2000

たとえば “a gleam of happiness”のような

04:14

It's four4つの words言葉; we call that a four-gram4グラム.

96

239000

2000

４語からなる“4-gram”が

04:16

We're going to tell you how manyたくさんの times回 a particular特に four-gram4グラム

97

241000

2000

本の中に何度現れるかわかります

04:18

appeared出現した in books本 in 1801, 1802, 1803,

98

243000

2000

1801年 1802年 1803年から

04:20

all the way up to 2008.

99

245000

2000

2008年に至るまで

04:22

That gives与える us a time seriesシリーズ

100

247000

2000

時とともにそのフレーズが

04:24

of how frequently頻繁に this particular特に sentence文 was used over time.

101

249000

2000

どれほどの頻度で使われているかわかるのです

04:26

We do that for all the words言葉 and phrasesフレーズ that appear現れる in those books本,

102

251000

3000

これを本に現れるあらゆる語やフレーズに対して行い

04:29

and that gives与える us a big大きい table表 of two billion億 lines行

103

254000

3000

20億行からなる膨大な表が得られました

04:32

that tell us about the way culture文化 has been changing変化.

104

257000

2000

それは文化がいかに変わってきたか教えてくれます

04:34

ELAELA: So those two billion億 lines行,

105

259000

2000

(エレズ) 20億行ですから

04:36

we call them two billion億 n-gramsnグラム.

106

261000

2000

「20億のn-gram」と呼んでいます

04:38

What do they tell us?

107

263000

2000

それは何を教えてくれるのでしょう？

04:40

Well the individual個人 n-gramsnグラム measure測定 cultural文化的 trendsトレンド.

108

265000

2000

個々のn-gramは文化のトレンドを示します

04:42

Let me give you an example例.

109

267000

2000

例を見てみましょう

04:44

Let's suppose想定する that I am thriving繁栄する,

110

269000

2000

私が今 “thrive”していて(うまくやっていて)

04:46

then tomorrow明日 I want to tell you about how well I did.

111

271000

2000

明日そのことを話したいと思ったとしましょう

04:48

And so I mightかもしれない say, "Yesterday昨日, I throve暴れる."

112

273000

3000

私は “Yesterday, I throve.”と言うかもしれません

04:51

Alternativelyあるいは, I could say, "Yesterday昨日, I thrived繁栄した."

113

276000

3000

あるいは “Yesterday, I thrived.”と言うかもしれません

04:54

Well whichどの one should I use?

114

279000

3000

どちらの形を使うべきでしょう？

04:57

How to know?

115

282000

2000

どうすればわかるのか？

04:59

As of about six6 months数ヶ月 ago前,

116

284000

2000

半年前であれば

05:01

the state状態 of the artアート in this fieldフィールド

117

286000

2000

この分野における最先端の方法は

05:03

is that you would, for instanceインスタンス,

118

288000

2000

たとえば

05:05

go up to the following以下 psychologist心理学者 with fabulous素晴らしい hairヘア,

119

290000

2000

この見事な髪をした心理学者の所に

05:07

and you'dあなたは say,

120

292000

2000

聞きに行くことだったでしょう

05:09

"Steveスティーブ, you're an expert専門家 on the irregular不規則な verbs動詞.

121

294000

3000

「ピンカーさんあなた不規則動詞の専門家ですよね

05:12

What should I do?"

122

297000

2000

どう言うべきでしょう？」

05:14

And he'd彼は tell you, "Well most最も people say thrived繁栄した,

123

299000

2000

彼は「たいていの人はthrivedと言いますが

05:16

but some people say throve暴れる."

124

301000

3000

throveと言う人もたまにいます」と答えるでしょう

05:19

And you alsoまた、 knew知っていた, more or lessもっと少なく,

125

304000

2000

ご存じかもしれませんが

05:21

that if you were to go back in time 200 years年

126

306000

3000

200年ほど遡って

05:24

and ask尋ねる the following以下 statesmanmed Smedmed volmedmed with equally均等に fabulous素晴らしい hairヘア,

127

309000

3000

この同じように見事な髪をした政治家の所に行って

05:27

(Laughter笑い)

128

312000

3000

(笑)

05:30

"Tomトム, what should I say?"

129

315000

2000

「ジェファーソンさんどう言うべきでしょう？」

05:32

He'd彼は say, "Well, in my day, most最も people throve暴れる,

130

317000

2000

と聞いたなら「私の頃には多くの人はthroveと言い

05:34

but some thrived繁栄した."

131

319000

3000

たまにthrivedと言う人がいましたね」と言うでしょう

05:37

So now what I'm just going to showショー you is raw生 dataデータ.

132

322000

2000

では生のデータをご覧に入れましょう

05:39

Two rows行 from this table表 of two billion億 entriesエントリー.

133

324000

4000

20億行の表の中の２つの行です

05:43

What you're seeing見る is year年 by year年 frequency周波数

134

328000

2000

ご覧いただいているのは

05:45

of "thrived繁栄した" and "throve暴れる" over time.

135

330000

3000

“thrived”と“throve”の年ごとの使用頻度です

05:49

Now this is just two

136

334000

2000

これは20億行の中の

05:51

out of two billion億 rows行.

137

336000

3000

２行に過ぎません

05:54

So the entire全体 dataデータ setセット

138

339000

2000

ですからデータの全体は

05:56

is a billion億 times回 more awesome驚くばかり than this slide滑り台.

139

341000

3000

このスライドの10億倍いかしていると言えるでしょう

05:59

(Laughter笑い)

140

344000

2000

(笑)

06:01

(Applause拍手)

141

346000

4000

(拍手)

06:05

JMJM: Now there are manyたくさんの other picturesピクチャー that are worth価値 500 billion億 words言葉.

142

350000

2000

(ジャン) 5千億語に値する絵は

06:07

For instanceインスタンス, this one.

143

352000

2000

他にもありますたとえばこれ

06:09

If you just take influenzaインフルエンザ,

144

354000

2000

「インフルエンザ」を取り上げてみると

06:11

you will see peaksピーク at the time where you knew知っていた

145

356000

2000

大きな流行が起きて

06:13

big大きい fluインフルエンザ epidemics流行 were killing殺す people around the globeグローブ.

146

358000

3000

世界中でたくさんの人が死んだ年に山があります

06:16

ELAELA: If you were not yetまだ convinced確信している,

147

361000

3000

(エレズ) もしまだ信じられないなら

06:19

sea海 levelsレベル are rising上昇する,

148

364000

2000

「海面」「大気中CO2」

06:21

so is atmospheric大気 COCO2 and globalグローバル temperature温度.

149

366000

3000

「地球気温」はご覧のように上昇しています

06:24

JMJM: You mightかもしれない alsoまた、 want to have a look at this particular特に n-gramnグラム,

150

369000

3000

(ジャン) このn-gramもご覧になりたいかもしれません

06:27

and that's to tell Nietzscheニーチェ that God is not deadデッド,

151

372000

3000

これはニーチェに神は死んでいないことを教えるものです

06:30

althoughただし、 you mightかもしれない agree同意する that he mightかもしれない need a better publicist広報者.

152

375000

3000

もっとも神様はもっといい広報担当者を雇うべきかもしれません

06:33

(Laughter笑い)

153

378000

2000

(笑)

06:35

ELAELA: You can get at some prettyかなり abstract抽象 conceptsコンセプト with this sortソート of thing.

154

380000

3000

(エレズ) 抽象概念について見ることもできます

06:38

For instanceインスタンス, let me tell you the history歴史

155

383000

2000

たとえば「1950年」の

06:40

of the year年 1950.

156

385000

2000

歴史を見てみましょう

06:42

Prettyかなり much for the vast広大 majority多数 of history歴史,

157

387000

2000

歴史上の大部分の時代において

06:44

no one gave与えた a damnくそー about 1950.

158

389000

2000

誰も1950年に注意を払ってはいませんでした

06:46

In 1700, in 1800, in 1900,

159

391000

2000

1700年 1800年 1900年

06:48

no one cared世話された.

160

393000

3000

誰も関心を持っていません

06:52

Throughスルー the 30s and 40s,

161

397000

2000

1930〜40年代になっても

06:54

no one cared世話された.

162

399000

2000

誰も関心を持っていません

06:56

Suddenly突然, in the mid-中期的には、40s,

163

401000

2000

40年代半ばになって

06:58

there started開始した to be a buzzバズ.

164

403000

2000

突然はやり出します

07:00

People realized実現した that 1950 was going to happen起こる,

165

405000

2000

みんな1950年はやってきて

07:02

and it could be big大きい.

166

407000

2000

それがすごいかもしれないと気づいたのです

07:04

(Laughter笑い)

167

409000

3000

(笑)

07:07

But nothing got people interested興味がある in 1950

168

412000

3000

しかし1950年ほど 1950年への関心の

07:10

like the year年 1950.

169

415000

3000

高かったときはありません

07:13

(Laughter笑い)

170

418000

3000

(笑)

07:16

People were walking歩く around obsessed執拗な.

171

421000

2000

みんな取り付かれたようです

07:18

They couldn'tできなかった stop talking話す

172

423000

2000

みんな話しやめることができません

07:20

about all the things they did in 1950,

173

425000

3000

1950年にしたいろんなことや

07:23

all the things they were planningプランニング to do in 1950,

174

428000

3000

1950年にしよう思っているいろんなこと

07:26

all the dreams夢 of what they wanted to accomplish達成する in 1950.

175

431000

5000

1950年に達成したいと思っているいろんな夢

07:31

In fact事実, 1950 was so fascinating魅力的な

176

436000

2000

実際 1950年はあまりに素晴らしく

07:33

that for years年 thereafterその後,

177

438000

2000

その後何年も人々は

07:35

people just kept保管 talking話す about all the amazing素晴らしい things that happened起こった,

178

440000

3000

その年の素晴らしい出来事について話し続けました

07:38

in '51, '52, '53.

179

443000

2000

51年 52年 53年

07:40

Finally最後に in 1954,

180

445000

2000

1954年になって

07:42

someone誰か woke目が覚めた up and realized実現した

181

447000

2000

ようやく目を覚まし

07:44

that 1950 had gotten得た somewhat幾分 passパスé.

182

449000

4000

1950年がもう時代遅れなことに気づいたのです

07:48

(Laughter笑い)

183

453000

2000

(笑)

07:50

And just like that, the bubbleバブル burstバースト.

184

455000

2000

そうやってバブルははじけました

07:52

(Laughter笑い)

185

457000

2000

(笑)

07:54

And the storyストーリー of 1950

186

459000

2000

同じことが記録のある

07:56

is the storyストーリー of everyすべて year年 that we have on record記録,

187

461000

2000

他のすべての年についても見られます

07:58

with a little twistねじれ, because now we've私たちは got these niceいい chartsチャート.

188

463000

3000

このような素敵なチャートを描くことができ

08:01

And because we have these niceいい chartsチャート, we can measure測定 things.

189

466000

3000

このチャートから様々なことを測定できます

08:04

We can say, "Well how fast速い does the bubbleバブル burstバースト?"

190

469000

2000

「バブルがはじけるのにどれくらいかかるか？」

08:06

And it turnsターン out that we can measure測定 that very precisely正確に.

191

471000

3000

実際非常に正確に測れることがわかります

08:09

Equations方程式 were derived派生, graphsグラフ were produced生産された,

192

474000

3000

方程式を導出しグラフを描いて

08:12

and the netネット result結果

193

477000

2000

結果として

08:14

is that we find that the bubbleバブル burstsバースト fasterもっと早く and fasterもっと早く

194

479000

3000

バブルがはじけるまでの時間は

08:17

with each各 passing通過 year年.

195

482000

2000

年々短くなっていることがわかります

08:19

We are losing負け interest利子 in the past過去 more rapidly急速に.

196

484000

5000

私たちは過去への興味を失うのが早くなっているのです

08:24

JMJM: Now a little pieceピース of careerキャリア advice助言.

197

489000

2000

(ジャン) キャリアについてひとつアドバイスしましょう

08:26

So for those of you who seekシーク to be famous有名な,

198

491000

2000

有名になりたいという人は

08:28

we can learn学ぶ from the 25 most最も famous有名な political政治的 figures数字,

199

493000

2000

25人の最も有名な政治家作家

08:30

authors著者, actors俳優 and so on.

200

495000

2000

俳優といった人々から学べます

08:32

So if you want to become〜になる famous有名な early早い on, you should be an actor俳優,

201

497000

3000

若いときに有名になりたいなら俳優(紫)になるべきです

08:35

because then fame名声 starts開始する rising上昇する by the end終わり of your 20s --

202

500000

2000

20代が終わる前に名声が上がっていきます

08:37

you're still young若い, it's really great.

203

502000

2000

まだまだ若く素敵なことです

08:39

Now if you can wait a little bitビット, you should be an author著者,

204

504000

2000

もう少し待てるのなら作家(青)がおすすめです

08:41

because then you rise上昇 to very great heightsハイツ,

205

506000

2000

すごい高みまで行くことができます

08:43

like Markマーク Twainトウェイン, for instanceインスタンス: extremely極端な famous有名な.

206

508000

2000

マーク・トウェインなんてすごく有名ですよね

08:45

But if you want to reachリーチ the very top上,

207

510000

2000

しかし本当の高みにまで行く気なら

08:47

you should delayディレイ gratification満足

208

512000

2000

ご褒美は遅らせて

08:49

and, of courseコース, become〜になる a politician政治家.

209

514000

2000

政治家(赤)になるべきでしょう

08:51

So here you will become〜になる famous有名な by the end終わり of your 50s,

210

516000

2000

有名になるのは50代の終わりですが

08:53

and become〜になる very, very famous有名な afterwardその後.

211

518000

2000

その後はものすごく有名になります

08:55

So scientists科学者 alsoまた、 tend傾向がある to get famous有名な when they're much olderより古い.

212

520000

3000

科学者も一般に年を取ってから有名になる傾向があります

08:58

Like for instanceインスタンス, biologists生物学者 and physics物理

213

523000

2000

生物学者(緑)や物理学者(灰)は

09:00

tend傾向がある to be almostほぼ as famous有名な as actors俳優.

214

525000

2000

俳優と同じくらい有名になります

09:02

One mistake間違い you should not do is become〜になる a mathematician数学者.

215

527000

3000

避けるべき誤りは数学者(黄)になることです

09:05

(Laughter笑い)

216

530000

2000

(笑)

09:07

If you do that,

217

532000

2000

「20代で最高の仕事をしてやるんだ」と

09:09

you mightかもしれない think, "Oh great. I'm going to do my bestベスト work when I'm in my 20s."

218

534000

3000

意気込んでいるかもしれませんが

09:12

But guess推測 what, nobody誰も will really careお手入れ.

219

537000

2000

誰も関心を持ってくれないのです

09:14

(Laughter笑い)

220

539000

3000

(笑)

09:17

ELAELA: There are more sobering冷静な notesノート

221

542000

2000

(エレズ) n-gramについては

09:19

among中 the n-gramsnグラム.

222

544000

2000

もっと暗い話もあります

09:21

For instanceインスタンス, here'sここにいる the trajectory軌道 of Marcマーク Chagallシャガール,

223

546000

2000

これは1887年生まれの画家

09:23

an artistアーティスト bornうまれた in 1887.

224

548000

2000

「マルク・シャガール」の曲線です

09:25

And this looks外見 like the normal正常 trajectory軌道 of a famous有名な person人.

225

550000

3000

有名人に典型的な曲線に見えます

09:28

He gets取得 more and more and more famous有名な,

226

553000

4000

年を追うごとに有名になっていきますが

09:32

exceptを除いて if you look in Germanドイツ人.

227

557000

2000

ドイツ語圏は例外です

09:34

If you look in Germanドイツ人, you see something completely完全に bizarre奇妙な,

228

559000

2000

まったく奇妙なことが起きています

09:36

something you prettyかなり much never see,

229

561000

2000

見たことのないようなことです

09:38

whichどの is he becomes〜になる extremely極端な famous有名な

230

563000

2000

非常に有名になった後

09:40

and then all of a sudden突然 plummets激しい,

231

565000

2000

突如としてどん底まで下落します

09:42

going throughを通して a nadir天底 betweenの間に 1933 and 1945,

232

567000

3000

1933年から1945年まで落ちていて

09:45

before reboundingリバウンド afterwardその後.

233

570000

3000

その後復帰します

09:48

And of courseコース, what we're seeing見る

234

573000

2000

お察しの通り

09:50

is the fact事実 Marcマーク Chagallシャガール was a Jewishユダヤ人 artistアーティスト

235

575000

3000

マルク・シャガールはナチスドイツ下の

09:53

in Naziナチ Germanyドイツ.

236

578000

2000

ユダヤ人画家だったということです

09:55

Now these signalsシグナル

237

580000

2000

このシグナルは

09:57

are actually実際に so strong強い

238

582000

2000

あまりに強いので

09:59

that we don't need to know that someone誰か was censored検閲された.

239

584000

3000

誰か検閲していたのかと訝るまでもないでしょう

10:02

We can actually実際に figure数字 it out

240

587000

2000

実際ごく基本的な信号処理で

10:04

usingを使用して really basic基本的な signal信号 processing処理.

241

589000

2000

そのことを示せます

10:06

Here'sここにいる a simple単純 way to do it.

242

591000

2000

どうやるのかというと

10:08

Well, a reasonable合理的な expectation期待

243

593000

2000

ある期間における

10:10

is that somebody's誰かの fame名声 in a given与えられた period期間 of time

244

595000

2000

誰かの有名度の期待値は

10:12

should be roughly大まかに the average平均 of their彼らの fame名声 before

245

597000

2000

大まかに言ってその前後における

10:14

and their彼らの fame名声 after.

246

599000

2000

有名度の平均になります

10:16

So that's sortソート of what we expect期待する.

247

601000

2000

それが予想される値です

10:18

And we compare比較する that to the fame名声 that we observe観察する.

248

603000

3000

その値を実際の観測値と比較します

10:21

And we just divide分ける one by the other

249

606000

2000

その２つの比は

10:23

to produce作物 something we call a suppression抑制 index索引.

250

608000

2000

いわば「弾圧指数」とでも言うべきものです

10:25

If the suppression抑制 index索引 is very, very, very small小さい,

251

610000

3000

弾圧指数がごく小さいなら

10:28

then you very well mightかもしれない be beingであること suppressed抑制された.

252

613000

2000

弾圧されている可能性が高く

10:30

If it's very large大, maybe you're benefiting恩恵を受ける from propaganda宣伝.

253

615000

3000

逆に大きい場合にはプロパガンダに助けられているのかもしれません

10:34

JMJM: Now you can actually実際に look at

254

619000

2000

(ジャン) あらゆる人の

10:36

the distribution分布 of suppression抑制 indexes索引 over whole全体 populations人口.

255

621000

3000

弾圧指数の分布を見ることもできます

10:39

So for instanceインスタンス, here --

256

624000

2000

たとえばこれは

10:41

this suppression抑制 index索引 is for 5,000 people

257

626000

2000

英語で書かれた本から選んだ

10:43

picked選んだ in English英語 books本 where there's no known既知の suppression抑制 --

258

628000

2000

弾圧の形跡のない5千人の弾圧指数です

10:45

it would be like this, basically基本的に tightlyしっかりと centered中心 on one.

259

630000

2000

中心にまとまったグラフになり

10:47

What you expect期待する is basically基本的に what you observe観察する.

260

632000

2000

期待値と観察値がほぼ一致します

10:49

This is distribution分布 as seen見た in Germanyドイツ --

261

634000

2000

こちらはドイツ語での分布ですが

10:51

very different異なる, it's shiftedシフトした to the left.

262

636000

2000

非常に異なっており左に寄っています

10:53

People talked話した about it twice二度 lessもっと少なく as it should have been.

263

638000

3000

本来よりも半分しか話題になっていません

10:56

But much more importantly重要なこと, the distribution分布 is much widerより広い.

264

641000

2000

しかも分布が横に広がっています

10:58

There are manyたくさんの people who end終わり up on the far遠い left on this distribution分布

265

643000

3000

本来の十分の一しか取り上げられていない

11:01

who are talked話した about 10 times回 fewer少ない than they should have been.

266

646000

3000

ずっと左の方に来ている人がたくさんいます

11:04

But then alsoまた、 manyたくさんの people on the far遠い right

267

649000

2000

一方でプロパガンダの恩恵を受けているらしい

11:06

who seem思われる to benefit利益 from propaganda宣伝.

268

651000

2000

ずっと右の方にいる人もいます

11:08

This picture画像 is the hallmark顕著な of censorship検閲 in the book本 record記録.

269

653000

3000

この図は本における検閲の存在を明らかに示しています

11:11

ELAELA: So culturomics培養系

270

656000

2000

(エレズ) この手法を

11:13

is what we call this method方法.

271

658000

2000

カルチュロミクス(culturomics)と呼んでいます

11:15

It's kind種類 of like genomicsゲノミクス.

272

660000

2000

ゲノミクスみたいなものです

11:17

Except例外 genomicsゲノミクス is a lensレンズ on biology生物学

273

662000

2000

ゲノミクスは

11:19

throughを通して the window窓 of the sequenceシーケンス of basesベース in the human人間 genomeゲノム.

274

664000

3000

ゲノムの塩基配列を通して生物学を見るレンズですが

11:22

CulturomicsCulturomics is similar類似.

275

667000

2000

カルチュロミクスは同様に

11:24

It's the application応用 of massive-scale大規模な dataデータ collectionコレクション analysis分析

276

669000

3000

人間の文化を研究するための

11:27

to the study調査 of human人間 culture文化.

277

672000

2000

大規模データ分析の応用です

11:29

Here, instead代わりに of throughを通して the lensレンズ of a genomeゲノム,

278

674000

2000

ゲノムのレンズの代わりに

11:31

throughを通して the lensレンズ of digitizedデジタル化された pieces作品 of the historical歴史的 record記録.

279

676000

3000

デジタル化された歴史記録のレンズを使うのです

11:34

The great thing about culturomics培養系

280

679000

2000

カルチュロミクスの素晴らしいところは

11:36

is that everyoneみんな can do it.

281

681000

2000

誰でもできるということです

11:38

Why can everyoneみんな do it?

282

683000

2000

なぜかというと

11:40

Everyoneみんな can do it because three三 guys,

283

685000

2000

Googleの３人

11:42

Jonジョン OrwantOrwant, Mattマット Grayグレー and Will Brockmanブロークマン over at GoogleGoogle,

284

687000

3000

ジョン・オーワントマット・グレイウィル・ブロックマンが

11:45

saw the prototypeプロトタイプ of the NgramNgram Viewerビューア,

285

690000

2000

開発中のNgram Viewerを見て

11:47

and they said, "This is so fun楽しい.

286

692000

2000

「これは楽しいみんな使えるようにすべきだ」

11:49

We have to make this available利用可能な for people."

287

694000

3000

と考えたからです

11:52

So in two weeks週 flat平らな -- the two weeks週 before our paper紙 came来た out --

288

697000

2000

私たちの論文が出版される2週間前に

11:54

they codedコードされた up a versionバージョン of the NgramNgram Viewerビューア for the general一般 publicパブリック.

289

699000

3000

彼らは一般の人も使えるNgram Viewerを作り上げました

11:57

And so you too can typeタイプ in any wordワード or phraseフレーズ that you're interested興味がある in

290

702000

3000

だから皆さんも興味のある言葉を打ち込んで

12:00

and see its n-gramnグラム immediatelyすぐに --

291

705000

2000

そのn-gramを即座に見ることができます

12:02

alsoまた、 browseブラウズ examples例 of all the various様々な books本

292

707000

2000

そのn-gramが現れる様々な文献の

12:04

in whichどの your n-gramnグラム appears登場する.

293

709000

2000

例を見ることもできます

12:06

JMJM: Now this was used over a million百万 times回 on the first day,

294

711000

2000

(ジャン) 公開初日に百万回以上使われましたが

12:08

and this is really the bestベスト of all the queriesクエリ.

295

713000

2000

これは中でもbestなクエリです

12:10

So people want to be their彼らの bestベスト, put their彼らの bestベスト foot足 forward前進.

296

715000

3000

みんなbestでありたい向上したいと思っています

12:13

But it turnsターン out in the 18thth century世紀, people didn't really careお手入れ about that at all.

297

718000

3000

しかし18世紀には誰もそんなこと気にかけていなかったようです

12:16

They didn't want to be their彼らの bestベスト, they wanted to be their彼らの beft駄目.

298

721000

3000

彼らはbestであろうとはせず beftであろうとしていたのです

12:19

So what happened起こった is, of courseコース, this is just a mistake間違い.

299

724000

3000

もっともこれは単なる間違いです

12:22

It's not that stroveやってみた for mediocrity平凡な,

300

727000

2000

みんな月並みでいいと思っていたわけではなく

12:24

it's just that the S used to be written書かれた differently異なって, kind種類 of like an F.

301

729000

3000

かつては s が違った形で書かれていて f に見えたのです

12:27

Now of courseコース, GoogleGoogle didn't pickピック this up at the time,

302

732000

3000

Googleは以前そのことに気づいておらず

12:30

so we reported報告 this in the science科学 article記事 that we wrote書きました.

303

735000

3000

私たちは科学記事の中でそのことを報告しました

12:33

But it turnsターン out this is just a reminder思い出させる

304

738000

2000

しかしこれはまた

12:35

that, althoughただし、 this is a lot of fun楽しい,

305

740000

2000

使うのがいかに楽しいにせよ

12:37

when you interpret解釈する these graphsグラフ, you have to be very careful慎重に,

306

742000

2000

グラフを解釈するときには十分注意を払い

12:39

and you have to adopt採用 the baseベース standards基準 in the sciences科学.

307

744000

3000

科学的方法の基本に従う必要があることを思い起こさせてくれます

12:42

ELAELA: People have been usingを使用して this for all kinds種類 of fun楽しい purposes目的.

308

747000

3000

(エレズ) みんなこれをあらゆる楽しいことに使っています

12:45

(Laughter笑い)

309

750000

7000

(「ウガー^n！」のグラフ) (笑)

12:52

Actually実際に, we're not going to have to talk,

310

757000

2000

説明するまでもありませんね

12:54

we're just going to showショー you all the slidesスライド and remain残る silentサイレント.

311

759000

3000

スライドを出して黙っていましょうか

12:57

This person人 was interested興味がある in the history歴史 of frustration欲求不満.

312

762000

3000

この人はフラストレーションの歴史に興味があるようです

13:00

There's various様々な typesタイプ of frustration欲求不満.

313

765000

3000

フラストレーションにもいろいろ種類があります

13:03

If you stubスタブ your toeつま先, that's a one A "arghああ."

314

768000

3000

つま先をぶつけた時は a が１つの“argh”です

13:06

If the planet惑星 Earth地球 is annihilated絶滅した by the Vogonsヴォーグンズ

315

771000

2000

星間バイパスの邪魔になるからと

13:08

to make roomルーム for an interstellar星間 bypassバイパス,

316

773000

2000

地球がヴォゴン星人に滅ぼされたときは

13:10

that's an eight8 A "aaaaaaaarghaaaaaaaargh."

317

775000

2000

a が８つの“aaaaaaaargh”です

13:12

This person人 studies研究 all the "arghsアーヘフ,"

318

777000

2000

この人は a が１～８個の

13:14

from one throughを通して eight8 A'sとして.

319

779000

2000

“argh”を調べていて

13:16

And it turnsターン out

320

781000

2000

それでわかるのは

13:18

that the less-frequent頻繁ではない "arghsアーヘフ"

321

783000

2000

よりフラトレーションの強い“argh”の方が

13:20

are, of courseコース, the onesもの that correspond対応する to things that are more frustratingイライラする --

322

785000

3000

使われる頻度が少ないということですが

13:23

exceptを除いて, oddly妙に, in the early早い 80s.

323

788000

3000

80年代初期には例外が見られます

13:26

We think that mightかもしれない have something to do with Reaganレーガン.

324

791000

2000

これは何かレーガンが関係していると

13:28

(Laughter笑い)

325

793000

2000

考えられます (笑)

13:30

JMJM: There are manyたくさんの usages用途 of this dataデータ,

326

795000

3000

(ジャン) このデータは様々な使い方ができますが

13:33

but the bottom下 lineライン is that the historical歴史的 record記録 is beingであること digitizedデジタル化された.

327

798000

3000

重要なのは歴史の記録がデジタル化されたということです

13:36

GoogleGoogle has started開始した to digitizeデジタル化する 15 million百万 books本.

328

801000

2000

Googleは千5百万冊デジタル化しました

13:38

That's 12 percentパーセント of all the books本 that have ever been published出版された.

329

803000

2000

かつて出版された本の12%に相当します

13:40

It's a sizable大きさ chunkチャンク of human人間 culture文化.

330

805000

3000

人類の文化の大きな塊です

13:43

There's much more in culture文化: there's manuscripts写本, there newspapers新聞,

331

808000

3000

文化には違った形のものとして手稿や新聞があり

13:46

there's things that are not textテキスト, like artアート and paintings絵画.

332

811000

2000

テキストではない芸術作品や絵画があります

13:48

These all happen起こる to be on our computersコンピュータ,

333

813000

2000

これらすべてが世界中のコンピュータの

13:50

on computersコンピュータ across横断する the world世界.

334

815000

2000

中にあるところを考えてください

13:52

And when that happens起こる, that will transform変換する the way we have

335

817000

3000

そうなったとき私たちが過去現在未来や

13:55

to understandわかる our past過去, our presentプレゼント and human人間 culture文化.

336

820000

2000

文化について理解する方法は変わるでしょう

13:57

Thank you very much.

337

822000

2000

どうもありがとうございました

13:59

(Applause拍手)

338

824000

3000

(拍手)

Translated by Yasushi Aoki
Reviewed by Yuki Okada

ABOUT THE SPEAKERS

Jean-Baptiste Michel - Data researcher
Jean-Baptiste Michel looks at how we can use large volumes of data to better understand our world.

Why you should listen

Jean-Baptiste Michel holds joint academic appointments at Harvard (FQEB Fellow) and Google (Visiting Faculty). His research focusses on using large volumes of data as tools that help better understand the world around us -- from the way diseases progress in patients over years, to the way cultures change in human societies over centuries. With his colleague Erez Lieberman Aiden, Jean-Baptiste is a Founding Director of Harvard's Cultural Observatory, where their research team pioneers the use of quantitative methods for the study of human culture, language and history. His research was featured on the covers of Science and Nature, on the front pages of the New York Times and the Boston Globe, in The Economist, Wired and many other venues. The online tool he helped create -- ngrams.googlelabs.com -- was used millions of times to browse cultural trends. Jean-Baptiste is an Engineer from Ecole Polytechnique (Paris), and holds an MS in Applied Mathematics and a PhD in Systems Biology from Harvard.

More profile about the speaker
Jean-Baptiste Michel | Speaker | TED.com

Erez Lieberman Aiden - Researcher
Erez Lieberman Aiden pursues a broad range of research interests, spanning genomics, linguistics, mathematics ...

Why you should listen

Erez Lieberman Aiden is a fellow at the Harvard Society of Fellows and Visiting Faculty at Google. His research spans many disciplines and has won numerous awards, including recognition for one of the top 20 "Biotech Breakthroughs that will Change Medicine", by Popular Mechanics; the Lemelson-MIT prize for the best student inventor at MIT; the American Physical Society's Award for the Best Doctoral Dissertation in Biological Physics; and membership in Technology Review's 2009 TR35, recognizing the top 35 innovators under 35. His last three papers -- two with JB Michel -- have all appeared on the cover of Nature and Science.

More profile about the speaker
Erez Lieberman Aiden | Speaker | TED.com

THE ORIGINAL VIDEO ON TED.COM

5百万冊の本から学んだこと | TED Talk | TED.com