ABOUT THE SPEAKER
Riccardo Sabatini - Scientist, entrepreneur
Riccardo Sabatini applies his expertise in numerical modeling and data to projects ranging from material science to computational genomics and food market predictions.

Why you should listen

Data scientist Riccardo Sabatini harnesses numerical methods for a surprising variety of fields, from material science research to the study of food commodities (as a past director of the EU research project FoodCAST). His most recent research centers on computational genomics and how to crack the code of life.

In addition to his data research, Sabatini is deeply involved in education for entrepreneurs. He is the founder and co-director of the Quantum ESPRESSO Foundation, an advisor in several data-driven startups, and funder of The HUB Trieste, a social impact accelerator.

More profile about the speaker
Riccardo Sabatini | Speaker | TED.com
TED2016

Riccardo Sabatini: How to read the genome and build a human being

Riccardo Sabatini: 如何解读基因组并组装人类

Filmed:
1,834,677 views

生命的秘密,疾病和美丽都由基因组编译——基因组是构建一个人需要的所有遗传信息。科学家和实业家Riccardo Sabatini在这里告诉我们,仅从一支试管的血液中,就可以读取基因组信息并预测身高、年龄、眼睛的颜色甚至面部构造。在不久的将来,我们对基因组的深入了解将能够实现针对个人的疾病治疗。拥有改变未来的能力,我们将如何运用它?
- Scientist, entrepreneur
Riccardo Sabatini applies his expertise in numerical modeling and data to projects ranging from material science to computational genomics and food market predictions. Full bio

Double-click the English transcript below to play the video.

00:12
For the next下一个 16 minutes分钟,
I'm going to take you on a journey旅程
0
612
2762
接下来的一刻钟,我要带大家踏上一段旅程
00:15
that is probably大概
the biggest最大 dream梦想 of humanity人性:
1
3398
3086
这大概是全人类的终极梦想——
00:18
to understand理解 the code of life.
2
6508
2015
解读生命的密码!
00:21
So for me, everything started开始
many许多, many许多 years年份 ago
3
9072
2743
我的经历开始于很多很多年以前,
那时我遇到了第一台3D打印机。
00:23
when I met会见 the first 3D printer打印机.
4
11839
2723
3D打印真是个非常赞的概念
00:26
The concept概念 was fascinating迷人.
5
14586
1674
00:28
A 3D printer打印机 needs需求 three elements分子:
6
16284
2022
它需要三个要素:
少量的信息,一些原材料,再加上点能量
00:30
a bit of information信息, some
raw生的 material材料, some energy能源,
7
18330
4134
就能制造出以前从没存在过的任何东西。
00:34
and it can produce生产 any object目的
that was not there before.
8
22488
3334
当时我正在研究物理学
00:38
I was doing physics物理,
I was coming未来 back home
9
26517
2137
有天我回到家,突然意识到我家里就有台3D打印机
00:40
and I realized实现 that I actually其实
always knew知道 a 3D printer打印机.
10
28678
3438
而且每人家里都有一台
00:44
And everyone大家 does.
11
32140
1336
就是我妈妈。
00:45
It was my mom妈妈.
12
33500
1158
00:46
(Laughter笑声)
13
34682
1001
我妈妈用这三个要素:
00:47
My mom妈妈 takes three elements分子:
14
35707
2414
00:50
a bit of information信息, which哪一个 is between之间
my father父亲 and my mom妈妈 in this case案件,
15
38145
3973
少量的信息——
来自我爸和我妈的共同投入
00:54
raw生的 elements分子 and energy能源
in the same相同 media媒体, that is food餐饮,
16
42142
4157
原材料和能量的共同来源——食物
00:58
and after several一些 months个月, produces产生 me.
17
46323
2508
历时几个月,制造出了我
01:00
And I was not existent存在 before.
18
48855
1812
而我以前从来没有存在过!
01:02
So apart距离 from the shock休克 of my mom妈妈
discovering发现 that she was a 3D printer打印机,
19
50691
3762
除了震惊的发现我妈其实是台3D打印机
01:06
I immediately立即 got mesmerized如醉如痴
by that piece,
20
54477
4738
我还立即被另一个部分吸引了
01:11
the first one, the information信息.
21
59239
1717
第一个要素,信息——
01:12
What amount of information信息 does it take
22
60980
2251
到底需要多少信息
01:15
to build建立 and assemble集合 a human人的?
23
63255
1936
才能制造和组装一个人呢?
01:17
Is it much? Is it little?
24
65215
1574
是要很多?还是很少?
01:18
How many许多 thumb拇指 drives驱动器 can you fill?
25
66813
2180
要用多少个U盘去储存?
01:21
Well, I was studying研究 physics物理
at the beginning开始
26
69017
2624
我最开始是学物理的,
01:23
and I took this approximation近似 of a human人的
as a gigantic巨大 Lego乐高玩具 piece.
27
71665
5597
我想如果把人看成是一个巨型的乐高玩具
01:29
So, imagine想像 that the building建造
blocks are little atoms原子
28
77286
3785
小的乐高模块就像是原子——
01:33
and there is a hydrogen here,
a carbon here, a nitrogen here.
29
81095
4653
这里有氢原子,这边有碳原子,上面这有氮原子。
01:37
So in the first approximation近似,
30
85772
1571
按照最初的这个设定
01:39
if I can list名单 the number of atoms原子
that compose撰写 a human人的 being存在,
31
87367
4343
如果能够列出组成人类的所有原子
01:43
I can build建立 it.
32
91734
1387
应该就能组装出一个人。
01:45
Now, you can run some numbers数字
33
93145
2029
大致计算一下
01:47
and that happens发生 to be
quite相当 an astonishing惊人 number.
34
95198
3277
得到的结果非常惊人。
01:50
So the number of atoms原子,
35
98499
2757
所需要的原子的总数,
01:53
the file文件 that I will save保存 in my thumb拇指
drive驾驶 to assemble集合 a little baby宝宝,
36
101280
4755
全部存到U盘里面——即便是组装一个小婴儿
01:58
will actually其实 fill an entire整个 Titanic泰坦尼克号
of thumb拇指 drives驱动器 --
37
106059
4667
用掉的U盘就能装满整个泰坦尼克号
02:02
multiplied乘以 2,000 times.
38
110750
2718
再乘以2000倍...
02:05
This is the miracle奇迹 of life.
39
113957
3401
这就是生命的奇迹。
02:09
Every一切 time you see from now on
a pregnant lady淑女,
40
117382
2612
现在你再看到一个孕妇
02:12
she's assembling组装 the biggest最大
amount of information信息
41
120018
2856
她正在组装你能见到的最大量的信息
02:14
that you will ever encounter遭遇.
42
122898
1556
02:16
Forget忘记 big data数据, forget忘记
anything you heard听说 of.
43
124478
2950
不要谈大数据,不要谈以前听说过的数字
02:19
This is the biggest最大 amount
of information信息 that exists存在.
44
127452
2881
这就是现存的,最最大量的信息。
02:22
(Applause掌声)
45
130357
3833
(掌声)
但是......
02:26
But nature性质, fortunately幸好, is much smarter聪明
than a young年轻 physicist物理学家,
46
134214
4644
好在大自然比一个年轻的物理学家要聪明多了。
02:30
and in four billion十亿 years年份, managed管理
to pack this information信息
47
138882
3576
在四十亿年的进化过程中
这些信息被压缩在叫做DNA的小晶体当中。
02:34
in a small crystal水晶 we call DNA脱氧核糖核酸.
48
142482
2705
02:37
We met会见 it for the first time in 1950
when Rosalind罗莎琳德 Franklin富兰克林,
49
145605
4312
在1950年代我们第一次知道了DNA
那时一位杰出的女科学家Rosalind Franklin
02:41
an amazing惊人 scientist科学家, a woman女人,
50
149941
1556
02:43
took a picture图片 of it.
51
151521
1389
给DNA拍了张照
02:44
But it took us more than 40 years年份
to finally最后 poke inside a human人的 cell细胞,
52
152934
5188
但我们花了超过40年的时间,
才最终能够从人类细胞中提取这种晶体,
02:50
take out this crystal水晶,
53
158146
1602
展开来,第一次去阅读它。
02:51
unroll it, and read it for the first time.
54
159772
3080
02:55
The code comes out to be
a fairly相当 simple简单 alphabet字母,
55
163615
3241
这个遗传密码由简单的字母表组成,
02:58
four letters: A, T, C and G.
56
166880
3772
四个字母,A,T,C和G (碱基)。
要组装一个人,需要30亿个字母。
03:02
And to build建立 a human人的,
you need three billion十亿 of them.
57
170676
3490
03:06
Three billion十亿.
58
174933
1179
30亿....30亿是多少?
03:08
How many许多 are three billion十亿?
59
176136
1579
03:09
It doesn't really make
any sense as a number, right?
60
177739
2762
光这么说大家可能都没概念,
03:12
So I was thinking思维 how
I could explain说明 myself better
61
180525
4085
我在想怎么表达才能让人更清楚,
03:16
about how big and enormous巨大 this code is.
62
184634
3050
这些遗传密码的数量到底有多庞大。
03:19
But there is -- I mean,
I'm going to have some help,
63
187708
3054
所以...我需要点帮助...
03:22
and the best最好 person to help me
introduce介绍 the code
64
190786
3227
最合适来帮我介绍遗传密码的人,
03:26
is actually其实 the first man
to sequence序列 it, Dr博士. Craig克雷格 Venter腹部.
65
194037
3522
就是第一位进行人类基因组测序的人,
Craig Venter 博士。
03:29
So welcome欢迎 onstage在舞台上, Dr博士. Craig克雷格 Venter腹部.
66
197583
3390
我们欢迎Craig Venter博士到台上来——
03:32
(Applause掌声)
67
200997
6931
(掌声)
03:39
Not the man in the flesh,
68
207952
2256
不是他本人——
03:43
but for the first time in history历史,
69
211448
2345
但这是史上第一次,一个人的基因组
03:45
this is the genome基因组 of a specific具体 human人的,
70
213817
3462
被一页一页,一个字母一个字母的打印在纸上——
03:49
printed印刷的 page-by-page页逐页, letter-by-letter逐个字母:
71
217303
3760
03:53
262,000 pages网页 of information信息,
72
221087
3996
总共26万2千页,450千克,
03:57
450 kilograms公斤, shipped
from the United联合的 States状态 to Canada加拿大
73
225107
4364
从美国运到加拿大
04:01
thanks谢谢 to Bruno布鲁诺 Bowden鲍登,
Lulu露露.comCOM, a start-up启动, did everything.
74
229495
4843
感谢Bruno Bowden还有 Lulu.com——
他们负责完成了这一切,一项壮举。
04:06
It was an amazing惊人 feat功绩.
75
234362
1463
这些就是生命密码给人最直观的视觉感受。
04:07
But this is the visual视觉 perception知觉
of what is the code of life.
76
235849
4297
04:12
And now, for the first time,
I can do something fun开玩笑.
77
240170
2478
现在我可以来玩点有趣的——
04:14
I can actually其实 poke inside it and read.
78
242672
2547
从这里面挑一段来读一读。
04:17
So let me take an interesting有趣
book ... like this one.
79
245243
4625
我来找一本有意思的...比如这一本...
04:25
I have an annotation注解;
it's a fairly相当 big book.
80
253077
2534
我放了书签在里面,这书太厚了...
04:27
So just to let you see
what is the code of life.
81
255635
3727
给你们看一下,生命的密码长什么样子
04:32
Thousands成千上万 and thousands数千 and thousands数千
82
260566
3391
成百上千...成千上万...上百万的字母...
04:35
and millions百万 of letters.
83
263981
2670
04:38
And they apparently显然地 make sense.
84
266675
2396
它们当然都有意义。
04:41
Let's get to a specific具体 part部分.
85
269095
1757
让我来找一段特殊的
04:43
Let me read it to you:
86
271571
1362
读给你们听...
04:44
(Laughter笑声)
87
272957
1021
04:46
"AAGAAG, AATAAT, ATAATA."
88
274002
4006
"AAG, AAT, ATA"
04:50
To you it sounds声音 like mute静音 letters,
89
278965
2067
你们可能觉得像是听天书,
04:53
but this sequence序列 gives
the color颜色 of the eyes眼睛 to Craig克雷格.
90
281056
4041
但这段序列决定了Craig眼睛的颜色。
04:57
I'll show显示 you another另一个 part部分 of the book.
91
285633
1932
在看看另外一段...
04:59
This is actually其实 a little
more complicated复杂.
92
287589
2094
这一段稍微复杂一些...
05:02
Chromosome染色体 14, book 132:
93
290983
2647
第14号染色体,书本编号132...
05:05
(Laughter笑声)
94
293654
2090
(笑声)
05:07
As you might威力 expect期望.
95
295768
1277
你们想象到了哦...
05:09
(Laughter笑声)
96
297069
3466
(笑声)
05:14
"ATTATT, CTTCTT, GATTGATT."
97
302857
4507
"ATT, CTT, GATT"
05:20
This human人的 is lucky幸运,
98
308329
1687
这个人很幸运,
05:22
because if you miss小姐 just
two letters in this position位置 --
99
310040
4517
因为如果他在这个位点上少了2个字母,
05:26
two letters of our three billion十亿 --
100
314581
1877
30亿中的2个...
05:28
he will be condemned谴责
to a terrible可怕 disease疾病:
101
316482
2019
他就会患上一种非常可怕的疾病——
05:30
cystic囊性 fibrosis纤维化.
102
318525
1440
囊肿性纤维化(cystic fibrosis)
05:31
We have no cure治愈 for it,
we don't know how to solve解决 it,
103
319989
3413
目前没有治疗的方法,这是绝症,
05:35
and it's just two letters
of difference区别 from what we are.
104
323426
3755
仅仅是2个字母的区别。
05:39
A wonderful精彩 book, a mighty威武 book,
105
327585
2705
这是一部鸿篇巨著,
05:43
a mighty威武 book that helped帮助 me understand理解
106
331115
1998
它帮助我理解,也能让你们看到
05:45
and show显示 you something quite相当 remarkable卓越.
107
333137
2753
一件更加另人叹为观止的事。
05:48
Every一切 one of you -- what makes品牌
me, me and you, you --
108
336480
4435
我们中的每一个人,
是什么让我成为我,让你成为你...
05:52
is just about five million百万 of these,
109
340939
2954
大概只占这其中的500万...
05:55
half a book.
110
343917
1228
只有半本书...
05:58
For the rest休息,
111
346015
1663
所有剩下的,我们完全一模一样。
05:59
we are all absolutely绝对 identical相同.
112
347702
2562
06:03
Five hundred pages网页
is the miracle奇迹 of life that you are.
113
351008
4018
500页,涵盖了你的生命奇迹;
06:07
The rest休息, we all share分享 it.
114
355050
2531
余下的,我们全都一样。
06:09
So think about that again
when we think that we are different不同.
115
357605
2909
讨论人与人差异的时候反思一下,
06:12
This is the amount that we share分享.
116
360538
2221
我们有这么多共通的东西。
06:15
So now that I have your attention注意,
117
363441
3429
现在我已经引起了你们的兴趣,
06:18
the next下一个 question is:
118
366894
1359
下一步就是:
06:20
How do I read it?
119
368277
1151
怎么去读取这些信息?
06:21
How do I make sense out of it?
120
369452
1509
怎么理解和运用它们?
06:23
Well, for however然而 good you can be
at assembling组装 Swedish瑞典 furniture家具,
121
371409
4240
不管你在组装宜家家居上有多在行...
06:27
this instruction指令 manual手册
is nothing you can crack裂纹 in your life.
122
375673
3563
这么长的说明书...基本是不可能完成的任务
06:31
(Laughter笑声)
123
379260
1603
2014年,两位著名的TED参加者
06:32
And so, in 2014, two famous著名 TEDstersTEDsters,
124
380887
3112
06:36
Peter彼得 Diamandis迪曼蒂斯 and Craig克雷格 Venter腹部 himself他自己,
125
384023
2540
Peter Diamandis 和 Craig Venter
06:38
decided决定 to assemble集合 a new company公司.
126
386587
1927
决定成立一个新公司
06:40
Human人的 Longevity长寿 was born天生,
127
388538
1412
人类长寿公司(Human Longevity, Inc.)诞生了。
06:41
with one mission任务:
128
389974
1370
唯一的任务——
06:43
trying everything we can try
129
391368
1861
竭尽全力,穷尽其学的研究这些书目
06:45
and learning学习 everything
we can learn学习 from these books图书,
130
393253
2759
06:48
with one target目标 --
131
396036
1705
只为达到一个目的:
06:50
making制造 real真实 the dream梦想
of personalized个性化 medicine医学,
132
398862
2801
让个人化医疗成为现实。
06:53
understanding理解 what things
should be doneDONE to have better health健康
133
401687
3767
怎么做才能提高人类健康水平
06:57
and what are the secrets秘密 in these books图书.
134
405478
2283
了解这些书目背后的秘密。
07:00
An amazing惊人 team球队, 40 data数据 scientists科学家们
and many许多, many许多 more people,
135
408329
4250
一个强大的团队,拥有40位数据分析人员
还有很多其他的人力支持
07:04
a pleasure乐趣 to work with.
136
412603
1350
和他们一起工作十分愉快。
07:05
The concept概念 is actually其实 very simple简单.
137
413977
2253
实际上工作流程不很复杂
07:08
We're going to use a technology技术
called machine learning学习.
138
416254
3158
我们用一种叫做机器学习的方法。
07:11
On one side, we have genomes基因组 --
thousands数千 of them.
139
419436
4539
一方面,我们有几千个基因组;
07:15
On the other side, we collected
the biggest最大 database数据库 of human人的 beings众生:
140
423999
3997
另一边我们建立一个超大的人类信息数据库:
07:20
phenotypes表型, 3D scan扫描, NMRNMR --
everything you can think of.
141
428020
4296
性状,3D扫描,核磁共振,所有能想到的
07:24
Inside there, on these two opposite对面 sides双方,
142
432340
2899
在这两个端点之间,
07:27
there is the secret秘密 of translation翻译.
143
435263
2442
有神秘的翻译在进行。
07:29
And in the middle中间, we build建立 a machine.
144
437729
2472
我们在中间建了一个机器,
07:32
We build建立 a machine
and we train培养 a machine --
145
440801
2385
建好之后训练这台机器——
07:35
well, not exactly究竟 one machine,
many许多, many许多 machines --
146
443210
3210
实际上不只一台机器,而是很多台...
07:38
to try to understand理解 and translate翻译
the genome基因组 in a phenotype表型.
147
446444
4544
试图去理解基因组并把它翻译成性状。
07:43
What are those letters,
and what do they do?
148
451362
3340
有哪些字母——它们控制什么性状——
07:46
It's an approach途径 that can
be used for everything,
149
454726
2747
这是普适的方法,可以用在所有问题上,
07:49
but using运用 it in genomics基因组学
is particularly尤其 complicated复杂.
150
457497
2993
但用在基因组学上异常的复杂。
07:52
Little by little we grew成长 and we wanted
to build建立 different不同 challenges挑战.
151
460514
3276
一点一点有了进展,我们再尝试更有挑战性的东西
07:55
We started开始 from the beginning开始,
from common共同 traits性状.
152
463814
2732
最开始我们从常见的特征下手,
07:58
Common共同 traits性状 are comfortable自在
because they are common共同,
153
466570
2603
常见特征最容易因为它们太常见了,
08:01
everyone大家 has them.
154
469197
1184
每个人都有。
08:02
So we started开始 to ask our questions问题:
155
470405
2494
我们开始提出如下问题:
08:04
Can we predict预测 height高度?
156
472923
1380
能预测身高吗?
08:06
Can we read the books图书
and predict预测 your height高度?
157
474985
2177
能不能根据这些信息预测身高?
08:09
Well, we actually其实 can,
158
477186
1151
可以,在5厘米的误差范围以内。
08:10
with five centimeters公分 of precision精确.
159
478361
1793
08:12
BMIBMI is fairly相当 connected连接的 to your lifestyle生活方式,
160
480178
3135
BMI 主要跟生活习惯有关,
08:15
but we still can, we get in the ballpark球场,
eight kilograms公斤 of precision精确.
161
483337
3864
但我们仍然能预测得差不多,8千克上下的误差。
08:19
Can we predict预测 eye color颜色?
162
487225
1231
眼睛的颜色能不能预测?
08:20
Yeah, we can.
163
488480
1158
可以,80%准确率。
08:21
Eighty八十 percent百分 accuracy准确性.
164
489662
1324
08:23
Can we predict预测 skin皮肤 color颜色?
165
491466
1858
皮肤颜色?
08:25
Yeah we can, 80 percent百分 accuracy准确性.
166
493348
2441
可以,80%准确。
08:27
Can we predict预测 age年龄?
167
495813
1340
年龄?
08:30
We can, because apparently显然地,
the code changes变化 during your life.
168
498121
3739
可以,因为很明显基因随着年龄产生变化。
08:33
It gets得到 shorter, you lose失去 pieces,
it gets得到 insertions插入.
169
501884
3282
DNA 会变短,缺失一些片段,插入另外一些片段
08:37
We read the signals信号, and we make a model模型.
170
505190
2555
我们读取这些信号,然后建立模型。
08:40
Now, an interesting有趣 challenge挑战:
171
508438
1475
现在来个有意思点的挑战:
08:41
Can we predict预测 a human人的 face面对?
172
509937
1729
我们能不能预测人的面孔?
08:45
It's a little complicated复杂,
173
513014
1278
这个略有点复杂,
08:46
because a human人的 face面对 is scattered疏散
among其中 millions百万 of these letters.
174
514316
3191
因为有几百万个碱基都对人脸产生影响。
08:49
And a human人的 face面对 is not
a very well-defined明确 object目的.
175
517531
2629
而且人脸并不是一个构造十分精准的物体。
08:52
So, we had to build建立 an entire整个 tier一线 of it
176
520184
2051
所以必须要建立一整个单独的模块,
08:54
to learn学习 and teach
a machine what a face面对 is,
177
522259
2710
给机器去训练和学习人脸是什么,
08:56
and embed and compress压缩 it.
178
524993
2037
再把这个模块压缩整合进去。
08:59
And if you're comfortable自在
with machine learning学习,
179
527054
2248
如果你对机器学习有点概念的话,
09:01
you understand理解 what the challenge挑战 is here.
180
529326
2284
就能够想象这个挑战是有多大。
09:04
Now, after 15 years年份 -- 15 years年份 after
we read the first sequence序列 --
181
532108
5991
现在15年过去了——15年前我们读取第一条序列
09:10
this October十月, we started开始
to see some signals信号.
182
538123
2902
——今年10月,我们总算有了些进展,
09:13
And it was a very emotional情绪化 moment时刻.
183
541049
2455
当时还是很激动人心的。
09:15
What you see here is a subject学科
coming未来 in our lab实验室.
184
543528
3745
这是我们的一个测试对象,一张人的脸——
09:19
This is a face面对 for us.
185
547619
1928
09:21
So we take the real真实 face面对 of a subject学科,
we reduce减少 the complexity复杂,
186
549571
3631
我们要对测试对象的面孔进行简化,
09:25
because not everything is in your face面对 --
187
553226
1970
因为并不是所有的特征都是面孔的一部分——
09:27
lots of features特征 and defects缺陷
and asymmetries不对称 come from your life.
188
555220
3786
很多特点、缺陷和不对称是生活的痕迹。
09:31
We symmetrize对称化 the face面对,
and we run our algorithm算法.
189
559030
3469
把面孔调整对称之后,跟我们运算的结果比较。
09:35
The results结果 that I show显示 you right now,
190
563245
1898
现在给你们看,我们根据血液样本生成的预测。
09:37
this is the prediction预测 we have
from the blood血液.
191
565167
3372
09:41
(Applause掌声)
192
569596
1524
(掌声)
09:43
Wait a second第二.
193
571144
1435
等一下——
09:44
In these seconds, your eyes眼睛 are watching观看,
left and right, left and right,
194
572603
4692
你们的眼睛正在左右两边交替看,
09:49
and your brain wants
those pictures图片 to be identical相同.
195
577319
3930
大脑希望两幅图是一模一样的。
09:53
So I ask you to do
another另一个 exercise行使, to be honest诚实.
196
581273
2446
我其实想请大家反过来,
09:55
Please search搜索 for the differences分歧,
197
583743
2287
找找两幅图的不同点,
09:58
which哪一个 are many许多.
198
586054
1361
其实非常多。
09:59
The biggest最大 amount of signal信号
comes from gender性别,
199
587439
2603
性别提供最多的信息,
10:02
then there is age年龄, BMIBMI,
the ethnicity种族 component零件 of a human人的.
200
590066
5201
接下来是年龄,BMI(体质指数),种族;
10:07
And scaling缩放 up over that signal信号
is much more complicated复杂.
201
595291
3711
再考虑更多因素会变得更加复杂。
10:11
But what you see here,
even in the differences分歧,
202
599026
3250
但是这样的结果,即便有很多不同,
10:14
lets让我们 you understand理解
that we are in the right ballpark球场,
203
602300
3595
表示我们已经接近了,
10:17
that we are getting得到 closer接近.
204
605919
1348
正在逐渐靠得更近——而且这已经能够鼓舞人心了
10:19
And it's already已经 giving you some emotions情绪.
205
607291
2349
10:21
This is another另一个 subject学科
that comes in place地点,
206
609664
2703
这是另外一个测试对象,
10:24
and this is a prediction预测.
207
612391
1409
这边是预测结果。
10:25
A little smaller face面对, we didn't get
the complete完成 cranial structure结构体,
208
613824
4596
脸小了一点,完整的颅骨结构没预测到。
10:30
but still, it's in the ballpark球场.
209
618444
2651
但至少像那么回事。
10:33
This is a subject学科 that comes in our lab实验室,
210
621634
2224
这是又一个测试对象,
10:35
and this is the prediction预测.
211
623882
1443
这是预测结果。
10:38
So these people have never been seen看到
in the training训练 of the machine.
212
626056
4676
这些面孔在训练机器的时候是没有用过的,
10:42
These are the so-called所谓 "held-out伸出" set.
213
630756
2837
就是所谓的随机测试组。
10:45
But these are people that you will
probably大概 never believe.
214
633617
3740
并且你们不认识这些人,可能说服力不太够。
10:49
We're publishing出版 everything
in a scientific科学 publication出版物,
215
637381
2676
我们在学术期刊上发表了这些结果,
10:52
you can read it.
216
640081
1151
你们可以去读一下。
10:53
But since以来 we are onstage在舞台上,
Chris克里斯 challenged挑战 me.
217
641256
2344
但既然我们在台上,Chris 给我出了个点子,
10:55
I probably大概 exposed裸露 myself
and tried试着 to predict预测
218
643624
3626
我可以挑战一下,尝试预测一个你们都认识的人。
10:59
someone有人 that you might威力 recognize认识.
219
647274
2831
11:02
So, in this vial小瓶 of blood血液 --
and believe me, you have no idea理念
220
650470
4425
这里有管血液——你们很难想象
11:06
what we had to do to have
this blood血液 now, here --
221
654919
2880
我们为了带一管血液到这里花了多少工夫...
11:09
in this vial小瓶 of blood血液 is the amount
of biological生物 information信息
222
657823
3901
这支试管里的血液足够完成一次全基因组测序
11:13
that we need to do a full充分 genome基因组 sequence序列.
223
661748
2277
11:16
We just need this amount.
224
664049
2070
只需要这么多。
11:18
We ran this sequence序列,
and I'm going to do it with you.
225
666528
3205
完成了测序,下面我们一条条来看——
11:21
And we start开始 to layer up
all the understanding理解 we have.
226
669757
3979
我们综合了所有已知的信息——
11:25
In the vial小瓶 of blood血液,
we predicted预料到的 he's a male.
227
673760
3350
从血液测试的结果,我们预测这是一名男性,
11:29
And the subject学科 is a male.
228
677134
1364
被试是男性。
11:30
We predict预测 that he's a meter仪表 and 76 cm厘米.
229
678996
2438
预测他身高1米76,
11:33
The subject学科 is a meter仪表 and 77 cm厘米.
230
681458
2392
被试身高1米77。
11:35
So, we predicted预料到的 that he's 76;
the subject学科 is 82.
231
683874
4110
预测他体重76kg,被试是82kg;
11:40
We predict预测 his age年龄, 38.
232
688701
2632
我们还预测了年龄,38岁
11:43
The subject学科 is 35.
233
691357
1904
被试实际是35岁。
11:45
We predict预测 his eye color颜色.
234
693851
2124
预测了眼睛的颜色,有点偏深了;
11:48
Too dark黑暗.
235
696824
1211
11:50
We predict预测 his skin皮肤 color颜色.
236
698059
1555
预测他的皮肤颜色,
11:52
We are almost几乎 there.
237
700026
1410
基本上准确。
11:53
That's his face面对.
238
701899
1373
这是他的面孔...
11:57
Now, the reveal揭示 moment时刻:
239
705172
3269
现在到了揭晓的时刻:
12:00
the subject学科 is this person.
240
708465
1770
被试对象是这个人。
12:02
(Laughter笑声)
241
710259
1935
(笑声)
12:04
And I did it intentionally故意地.
242
712218
2058
我是有意拿自己做测试的,
12:06
I am a very particular特定
and peculiar奇特 ethnicity种族.
243
714300
3692
我属于一个特别又特殊的种族,
12:10
Southern南部的 European欧洲的, Italians意大利 --
they never fit适合 in models楷模.
244
718016
2950
南欧人,意大利人——从来都不符合模型预测。
12:12
And it's particular特定 -- that ethnicity种族
is a complex复杂 corner case案件 for our model模型.
245
720990
5130
而且这一种族在模型里是一个复杂的边界情况。
12:18
But there is another另一个 point.
246
726144
1509
但还有另一个重点——
12:19
So, one of the things that we use
a lot to recognize认识 people
247
727677
3477
最常用的来辨识人的方法,
12:23
will never be written书面 in the genome基因组.
248
731178
1722
不是由基因组编译的。
12:24
It's our free自由 will, it's how I look.
249
732924
2317
是人们的自由意志——我想让自己看起来怎么样,
12:27
Not my haircut理发 in this case案件,
but my beard胡子 cut.
250
735265
3229
虽然我的发型不是我自己决定的,但胡子是的。
12:30
So I'm going to show显示 you, I'm going to,
in this case案件, transfer转让 it --
251
738518
3553
下面我们来看一下——
12:34
and this is nothing more
than PhotoshopPhotoshop中, no modeling造型 --
252
742095
2765
单纯的用photoshop,不用建模——
12:36
the beard胡子 on the subject学科.
253
744884
1713
把胡子加上去。
12:38
And immediately立即, we get
much, much better in the feeling感觉.
254
746621
3472
是不是立即觉得变得很相像了。
12:42
So, why do we do this?
255
750955
2709
那么,我们为什么要研究这些?
12:47
We certainly当然 don't do it
for predicting预测 height高度
256
755938
5140
当然不是为了预测身高,
12:53
or taking服用 a beautiful美丽 picture图片
out of your blood血液.
257
761102
2372
或者是根据血液样本得到一张美照;
12:56
We do it because the same相同 technology技术
and the same相同 approach途径,
258
764390
4018
我们研究是因为同样的技术和手段——
13:00
the machine learning学习 of this code,
259
768432
2520
对基因组的机器学习,
13:02
is helping帮助 us to understand理解 how we work,
260
770976
3137
能帮助我们了解人类自身,
13:06
how your body身体 works作品,
261
774137
1486
你的身体怎么运作,身体如何老化,
13:07
how your body身体 ages年龄,
262
775647
1665
疾病是如何产生的,
13:09
how disease疾病 generates生成 in your body身体,
263
777336
2769
13:12
how your cancer癌症 grows成长 and develops发展,
264
780129
2972
癌症是怎么出现和恶化的;
13:15
how drugs毒品 work
265
783125
1783
药物如何起作用——
13:16
and if they work on your body身体.
266
784932
2314
药物是不是能够对你有效。
这是一个巨大的挑战,
13:19
This is a huge巨大 challenge挑战.
267
787713
1667
13:21
This is a challenge挑战 that we share分享
268
789894
1638
而且是一个全球的科学家都面临的挑战
13:23
with thousands数千 of other
researchers研究人员 around the world世界.
269
791556
2579
13:26
It's called personalized个性化 medicine医学.
270
794159
2222
——个性化医疗。
13:29
It's the ability能力 to move移动
from a statistical统计 approach途径
271
797125
3460
从只能借助统计学方法——
13:32
where you're a dot in the ocean海洋,
272
800609
2032
每个人都只是沧海一粟——
13:34
to a personalized个性化 approach途径,
273
802665
1813
到能够实现有针对性的治疗,
13:36
where we read all these books图书
274
804502
2185
通过解码这些基因信息,
13:38
and we get an understanding理解
of exactly究竟 how you are.
275
806711
2864
我们能够彻底了解每一个人。
13:42
But it is a particularly尤其
complicated复杂 challenge挑战,
276
810260
3362
但这是一项异常复杂的挑战,
13:45
because of all these books图书, as of today今天,
277
813646
3998
因为到目前为止在这么庞大的基因组信息中,
13:49
we just know probably大概 two percent百分:
278
817668
2642
我们大概只了解2%:
13:53
four books图书 of more than 175.
279
821027
3653
175本书里的4本...
13:58
And this is not the topic话题 of my talk,
280
826021
3206
当然这不是我今天演讲的主题,
14:02
because we will learn学习 more.
281
830145
2598
因为我们会进步,会了解更多——
14:05
There are the best最好 minds头脑
in the world世界 on this topic话题.
282
833378
2669
有很多顶尖的人才在从事这项工作。
14:09
The prediction预测 will get better,
283
837048
1834
预测能力会提升,模型会更准确。
14:10
the model模型 will get more precise精确.
284
838906
2253
14:13
And the more we learn学习,
285
841183
1858
随着了解的逐渐深入,
14:15
the more we will
be confronted面对 with decisions决定
286
843065
4830
我们需要做的决定会越来越多,
14:19
that we never had to face面对 before
287
847919
3021
而且是一些从前没有想象过的决定——
14:22
about life,
288
850964
1435
关于生,关于死,关于子孙后代...
14:24
about death死亡,
289
852423
1674
14:26
about parenting育儿.
290
854121
1603
所以我们在此的讨论,涉及生命最本质的东西,
14:32
So, we are touching接触 the very
inner detail详情 on how life works作品.
291
860626
4746
14:38
And it's a revolution革命
that cannot不能 be confined受限
292
866118
3158
这些改变不只是在科学和技术层面。
14:41
in the domain of science科学 or technology技术.
293
869300
2659
我们必须要有全球性的对话,
14:44
This must必须 be a global全球 conversation会话.
294
872960
2244
必须要为全人类的未来设想。
14:47
We must必须 start开始 to think of the future未来
we're building建造 as a humanity人性.
295
875798
5217
我们需要和创新人才、艺术家、哲学家交流,
14:53
We need to interact相互作用 with creatives创意,
with artists艺术家, with philosophers哲学家,
296
881039
4064
14:57
with politicians政治家.
297
885127
1510
还需要政治家的参与。
14:58
Everyone大家 is involved参与,
298
886661
1158
每个人都身在其中,因为这关乎人类的未来。
14:59
because it's the future未来 of our species种类.
299
887843
2825
15:03
Without没有 fear恐惧, but with the understanding理解
300
891273
3968
不需要惊慌——
但必须了解我们现在做出的每一项决定,
15:07
that the decisions决定
that we make in the next下一个 year
301
895265
3871
15:11
will change更改 the course课程 of history历史 forever永远.
302
899160
3789
都会彻底改变历史。
15:15
Thank you.
303
903732
1160
谢谢。
15:16
(Applause掌声)
304
904916
10159
(持久的掌声)
Translated by Jingqi Gong
Reviewed by Rachel Li

▲Back to top

ABOUT THE SPEAKER
Riccardo Sabatini - Scientist, entrepreneur
Riccardo Sabatini applies his expertise in numerical modeling and data to projects ranging from material science to computational genomics and food market predictions.

Why you should listen

Data scientist Riccardo Sabatini harnesses numerical methods for a surprising variety of fields, from material science research to the study of food commodities (as a past director of the EU research project FoodCAST). His most recent research centers on computational genomics and how to crack the code of life.

In addition to his data research, Sabatini is deeply involved in education for entrepreneurs. He is the founder and co-director of the Quantum ESPRESSO Foundation, an advisor in several data-driven startups, and funder of The HUB Trieste, a social impact accelerator.

More profile about the speaker
Riccardo Sabatini | Speaker | TED.com