ABOUT THE SPEAKER
Yaniv Erlich - Computational geneticist
Yaniv Erlich is fascinated by the connection between DNA and data.

Why you should listen

As a professor and researcher at Columbia University and as CSO of MyHeritage.com, Yaniv Erlich has performed foundational work in genetic privacy and large-scale studies of crowdsourced genomic data. Dubbed a "genome hacker" by the journal Nature, Erlich and his team discovered a privacy loophole enabling reidentification of allegedly anonymous male research participants using just internet searches and their Y chromosome. Later, he discovered that 60 percent of all US individuals with European descent can be identified by forensic genetics using open genetic genealogy databases, which Science magazine called one of the top 10 breakthroughs of 2018.

Erlich is also responsible for the construction of the world's largest family tree, comprising 13 million people, as well as the development of the website DNA.land, which has compiled the genotypes of more than 150,000 donors. He has also worked to discover the genetic bases for several conditions in Israeli families. His team has demonstrated stable DNA data storage, reaching a density of 215 petabyte per gram of DNA. He's been awarded numerous prizes, has published more than 45 papers and authored seven patents.

More profile about the speaker
Yaniv Erlich | Speaker | TED.com
TEDMED 2018

Yaniv Erlich: How we're building the world's largest family tree

Filmed:
1,507,766 views

Computational geneticist Yaniv Erlich helped build the world's largest family tree -- comprising 13 million people and going back more than 500 years. He shares fascinating patterns that emerged from the work -- about our love lives, our health, even decades-old criminal cases -- and shows how crowdsourced genealogy databases can shed light not only on the past but also on the future.
- Computational geneticist
Yaniv Erlich is fascinated by the connection between DNA and data. Full bio

Double-click the English transcript below to play the video.

00:12
People use the internet
for various reasons.
0
817
3452
00:17
It turns out that one of the most
popular categories of website
1
5765
3804
00:21
is something that people
typically consume in private.
2
9593
2872
00:25
It involves curiosity,
3
13639
2510
00:28
non-insignificant levels
of self-indulgence
4
16173
3796
00:31
and is centered around recording
the reproductive activities
5
19993
3260
00:35
of other people.
6
23277
1309
00:36
(Laughter)
7
24610
1032
00:37
Of course, I'm talking about genealogy --
8
25666
2250
00:39
(Laughter)
9
27940
1214
00:41
the study of family history.
10
29178
1702
00:43
When it comes to detailing family history,
11
31353
2037
00:45
in every family, we have this person
that is obsessed with genealogy.
12
33414
3943
00:49
Let's call him Uncle Bernie.
13
37381
1713
00:51
Uncle Bernie is exactly the last person
you want to sit next to
14
39118
3782
00:54
in Thanksgiving dinner,
15
42924
1599
00:56
because he will bore you to death
with peculiar details
16
44547
2814
00:59
about some ancient relatives.
17
47385
1966
01:02
But as you know,
18
50462
1262
01:03
there is a scientific side for everything,
19
51748
2872
01:06
and we found that Uncle Bernie's stories
20
54644
2978
01:09
have immense potential
for biomedical research.
21
57646
3168
01:13
We let Uncle Bernie
and his fellow genealogists
22
61306
2714
01:16
document their family trees through
a genealogy website called geni.com.
23
64044
4668
01:21
When users upload
their trees to the website,
24
69198
2128
01:23
it scans their relatives,
25
71350
1690
01:25
and if it finds matches to existing trees,
26
73064
2075
01:27
it merges the existing
and the new tree together.
27
75163
3610
01:31
The result is that large
family trees are created,
28
79768
2950
01:34
beyond the individual level
of each genealogist.
29
82742
3479
01:38
Now, by repeating this process
with millions of people
30
86808
4129
01:42
all over the world,
31
90961
1817
01:44
we can crowdsource the construction
of a family tree of all humankind.
32
92802
5532
01:51
Using this website,
33
99292
1584
01:52
we were able to connect 125 million people
34
100900
4813
01:57
into a single family tree.
35
105737
2521
02:00
I cannot draw the tree
on the screens over here
36
108967
2788
02:03
because they have less pixels
37
111779
2165
02:05
than the number of people in this tree.
38
113968
2513
02:08
But here is an example of a subset
of 6,000 individuals.
39
116505
5010
02:14
Each green node is a person.
40
122159
2362
02:17
The red nodes represent marriages,
41
125060
2849
02:19
and the connections represent parenthood.
42
127933
2258
02:22
In the middle of this tree,
you see the ancestors.
43
130557
2372
02:24
And as we go to the periphery,
you see the descendants.
44
132953
2604
02:27
This tree has seven
generations, approximately.
45
135581
3102
02:31
Now, this is what happens
when we increase the number of individuals
46
139692
3234
02:34
to 70,000 people --
47
142950
1828
02:36
still a tiny subset
of all the data that we have.
48
144802
4330
02:41
Despite that, you can already see
the formation of gigantic family trees
49
149629
4813
02:46
with many very distant relatives.
50
154466
2655
02:49
Thanks to the hard work
of our genealogists,
51
157610
3134
02:52
we can go back in time
hundreds of years ago.
52
160768
3103
02:56
For example, here is Alexander Hamilton,
53
164418
3441
02:59
who was born in 1755.
54
167883
2475
03:02
Alexander was the first
US Secretary of the Treasury,
55
170872
3764
03:06
but mostly known today
due to a popular Broadway musical.
56
174660
3831
03:11
We found that Alexander has deeper
connections in the showbiz industry.
57
179137
4922
03:16
In fact, he's a blood relative of ...
58
184083
2111
03:18
Kevin Bacon!
59
186781
1220
03:20
(Laughter)
60
188025
2032
03:22
Both of them are descendants
of a lady from Scotland
61
190081
2606
03:24
who lived in the 13th century.
62
192711
2314
03:27
So you can say that Alexander Hamilton
63
195049
3102
03:30
is 35 degrees of Kevin Bacon genealogy.
64
198175
3188
03:33
(Laughter)
65
201387
1441
03:34
And our tree has millions
of stories like that.
66
202852
3230
03:40
We invested significant efforts
to validate the quality of our data.
67
208113
4890
03:45
Using DNA, we found that .3 percent of
the mother-child connections in our data
68
213027
5391
03:50
are wrong,
69
218442
1250
03:51
which could match the adoption rate
in the US pre-Second World War.
70
219716
3591
03:56
For the father's side,
71
224847
1785
03:58
the news is not as good:
72
226656
1961
04:02
1.9 percent of the father-child
connections in our data are wrong.
73
230149
5600
04:07
And I see some people smirk over here.
74
235773
2363
04:10
It is what you think --
75
238160
1717
04:11
there are many milkmen out there.
76
239901
1789
04:13
(Laughter)
77
241714
1064
04:14
However, this 1.9 percent error rate
in patrilineal connections
78
242802
3989
04:18
is not unique to our data.
79
246815
1769
04:20
Previous studies found
a similar error rate
80
248608
3069
04:23
using clinical-grade pedigrees.
81
251701
2021
04:26
So the quality of our data is good,
82
254254
2525
04:28
and that should not be a surprise.
83
256803
2133
04:30
Our genealogists have
a profound, vested interest
84
258960
3776
04:34
in correctly documenting
their family history.
85
262760
3668
04:40
We can leverage this data to learn
quantitative information about humanity,
86
268594
4591
04:45
for example, questions about demography.
87
273209
2596
04:47
Here is a look at all our profiles
on the map of the world.
88
275829
3857
04:52
Each pixel is a person
that lived at some point.
89
280250
4481
04:56
And since we have so much data,
90
284755
1680
04:58
you can see the contours
of many countries,
91
286459
2781
05:01
especially in the Western world.
92
289264
2099
05:03
In this clip, we stratified
the map that I've showed you
93
291387
3548
05:06
based on the year of births of individuals
from 1400 to 1900,
94
294959
5072
05:12
and we compared it
to known migration events.
95
300055
2766
05:15
The clip is going to show you
that the deepest lineages in our data
96
303482
3165
05:18
go all the way back to the UK,
97
306671
1627
05:20
where they had better record keeping,
98
308322
1808
05:22
and then they spread along
the routes of Western colonialism.
99
310154
3282
05:25
Let's watch this.
100
313460
1322
05:27
(Music)
101
315143
1609
05:28
[Year of birth: ]
102
316776
2341
05:31
[1492 - Columbus sails the ocean blue]
103
319705
1836
05:35
[1620 - Mayflower lands in Massachusetts]
104
323661
2000
05:38
[1652 - Dutch settle in South Africa]
105
326726
1775
05:44
[1788 - Great Britain penal
transportation to Australia starts]
106
332321
3186
05:47
[1836 - First migrants use Oregon Trail]
107
335531
1927
05:50
[all activity]
108
338149
3183
05:55
I love this movie.
109
343851
1543
05:57
Now, since these migration events
are giving the context of families,
110
345418
5093
06:02
we can ask questions such as:
111
350535
2183
06:04
What is the typical distance
between the birth locations
112
352742
3470
06:08
of husbands and wives?
113
356236
2812
06:11
This distance plays
a pivotal role in demography,
114
359072
3677
06:14
because the patterns in which
people migrate to form families
115
362773
3681
06:18
determine how genes spread
in geographical areas.
116
366478
3713
06:22
We analyzed this distance using our data,
117
370706
2328
06:25
and we found that in the old days,
118
373058
2290
06:27
people had it easy.
119
375372
1230
06:28
They just married someone
in the village nearby.
120
376626
2594
06:31
But the Industrial Revolution
really complicated our love life.
121
379958
3705
06:35
And today, with affordable flights
and online social media,
122
383687
4560
06:40
people typically migrate more than
100 kilometers from their place of birth
123
388271
4828
06:45
to find their soul mate.
124
393123
1504
06:48
So now you might ask:
125
396524
1187
06:49
OK, but who does the hard work
of migrating from places to places
126
397735
4496
06:54
to form families?
127
402255
1269
06:55
Are these the males or the females?
128
403548
3727
06:59
We used our data to address this question,
129
407752
2155
07:01
and at least in the last 300 years,
130
409931
2594
07:04
we found that the ladies do the hard work
131
412549
3883
07:08
of migrating from places
to places to form families.
132
416456
2996
07:11
Now, these results
are statistically significant,
133
419476
3101
07:14
so you can take it as scientific fact
that males are lazy.
134
422601
3471
07:18
(Laughter)
135
426096
3156
07:21
We can move from questions
about demography
136
429276
2536
07:23
and ask questions about human health.
137
431836
2913
07:26
For example, we can ask
138
434773
1487
07:28
to what extent genetic variations
account for differences in life span
139
436284
4963
07:33
between individuals.
140
441271
1194
07:34
Previous studies analyzed the correlation
of longevity between twins
141
442988
4530
07:39
to address this question.
142
447542
1442
07:41
They estimated that the genetic
variations account for
143
449411
2667
07:44
about a quarter of the differences
in life span between individuals.
144
452102
4040
07:48
But twins can be correlated
due to so many reasons,
145
456688
2598
07:51
including various environmental effects
146
459310
2304
07:53
or a shared household.
147
461638
1622
07:56
Large family trees give us the opportunity
to analyze both close relatives,
148
464411
3753
08:00
such as twins,
149
468188
1207
08:01
all the way to distant relatives,
even fourth cousins.
150
469419
2917
08:04
This way we can build robust models
151
472749
2689
08:07
that can tease apart the contribution
of genetic variations
152
475462
3708
08:11
from environmental factors.
153
479194
1717
08:13
We conducted this analysis using our data,
154
481379
2899
08:16
and we found that genetic variations
explain only 15 percent
155
484302
5791
08:22
of the differences in life span
between individuals.
156
490117
2806
08:26
That is five years, on average.
157
494760
2756
08:30
So genes matter less than
what we thought before to life span.
158
498316
4708
08:35
And I find it great news,
159
503675
2136
08:38
because it means that
our actions can matter more.
160
506438
3293
08:42
Smoking, for example, determines
10 years of our life expectancy --
161
510533
4274
08:46
twice as much as what genetics determines.
162
514831
2646
08:50
We can even have more surprising findings
163
518236
2289
08:52
as we move from family trees
164
520549
1492
08:54
and we let our genealogists
document and crowdsource DNA information.
165
522065
4732
08:58
And the results can be amazing.
166
526821
2024
09:01
It might be hard to imagine,
but Uncle Bernie and his friends
167
529255
3915
09:05
can create DNA forensic capabilities
168
533194
2646
09:07
that even exceed
what the FBI currently has.
169
535864
3559
09:12
When you place the DNA
on a large family tree,
170
540862
2404
09:15
you effectively create a beacon
171
543290
2117
09:17
that illuminates the hundreds
of distant relatives
172
545431
2634
09:20
that are all connected to the person
that originated the DNA.
173
548089
3490
09:24
By placing multiple beacons
on a large family tree,
174
552505
2913
09:27
you can now triangulate the DNA
of an unknown person,
175
555442
3720
09:31
the same way that the GPS system
uses multiple satellites
176
559186
3938
09:35
to find a location.
177
563148
1324
09:37
The prime example
of the power of this technique
178
565226
3624
09:40
is capturing the Golden State Killer,
179
568874
2675
09:44
one of the most notorious criminals
in the history of the US.
180
572612
4528
09:49
The FBI had been searching
for this person for over 40 years.
181
577164
5892
09:55
They had his DNA,
182
583588
1835
09:57
but he never showed up
in any police database.
183
585447
3350
10:01
About a year ago, the FBI
consulted a genetic genealogist,
184
589447
4712
10:06
and she suggested that they submit
his DNA to a genealogy service
185
594183
3950
10:10
that can locate distant relatives.
186
598157
2398
10:13
They did that,
187
601117
1156
10:14
and they found a third cousin
of the Golden State Killer.
188
602297
3692
10:18
They built a large family tree,
189
606013
2344
10:20
scanned the different
branches of that tree,
190
608381
2102
10:22
until they found a profile
that exactly matched
191
610507
2565
10:25
what they knew about
the Golden State Killer.
192
613096
2581
10:27
They obtained DNA from this person
and found a perfect match
193
615701
3592
10:31
to the DNA they had in hand.
194
619317
2025
10:33
They arrested him
and brought him to justice
195
621366
2350
10:35
after all these years.
196
623740
1424
10:38
Since then, genetic genealogists
have started working with
197
626172
3241
10:41
local US law enforcement agencies
198
629437
2668
10:44
to use this technique
in order to capture criminals.
199
632129
3362
10:47
And only in the past six months,
200
635521
2681
10:50
they were able to solve
over 20 cold cases with this technique.
201
638226
4296
10:56
Luckily, we have people like Uncle
Bernie and his fellow genealogists
202
644203
4636
11:01
These are not amateurs
with a self-serving hobby.
203
649045
2994
11:04
These are citizen scientists
with a deep passion to tell us who we are.
204
652602
6419
11:11
And they know that the past
can hold a key to the future.
205
659065
4458
11:16
Thank you very much.
206
664067
1183
11:17
(Applause)
207
665314
3469

▲Back to top

ABOUT THE SPEAKER
Yaniv Erlich - Computational geneticist
Yaniv Erlich is fascinated by the connection between DNA and data.

Why you should listen

As a professor and researcher at Columbia University and as CSO of MyHeritage.com, Yaniv Erlich has performed foundational work in genetic privacy and large-scale studies of crowdsourced genomic data. Dubbed a "genome hacker" by the journal Nature, Erlich and his team discovered a privacy loophole enabling reidentification of allegedly anonymous male research participants using just internet searches and their Y chromosome. Later, he discovered that 60 percent of all US individuals with European descent can be identified by forensic genetics using open genetic genealogy databases, which Science magazine called one of the top 10 breakthroughs of 2018.

Erlich is also responsible for the construction of the world's largest family tree, comprising 13 million people, as well as the development of the website DNA.land, which has compiled the genotypes of more than 150,000 donors. He has also worked to discover the genetic bases for several conditions in Israeli families. His team has demonstrated stable DNA data storage, reaching a density of 215 petabyte per gram of DNA. He's been awarded numerous prizes, has published more than 45 papers and authored seven patents.

More profile about the speaker
Yaniv Erlich | Speaker | TED.com