ABOUT THE SPEAKER
Dina Zielinski - Bench scientist, bioinformatician
Dina Zielinski brings biological data to life, from decoding mutations in cancer to encoding data in DNA.

Why you should listen

Dina Zielinski is a bench scientist and bioinformatician with broad experience in genetics and genomics. Her current projects are in developmental biology and neurology, but she is motivated to keep learning and using her skills to answer scientific questions that challenge our deepest knowledge.

More profile about the speaker
Dina Zielinski | Speaker | TED.com
TEDxVienna

Dina Zielinski: How we can store digital data in DNA

Filmed:
1,777,866 views

From floppy disks to thumb drives, every method of storing data eventually becomes obsolete. What if we could find a way to store all the world's data forever? Bioinformatician Dina Zielinski shares the science behind a solution that's been around for a few billion years: DNA.
- Bench scientist, bioinformatician
Dina Zielinski brings biological data to life, from decoding mutations in cancer to encoding data in DNA. Full bio

Double-click the English transcript below to play the video.

00:12
I could fit all movies ever made
inside of this tube.
0
652
5196
00:17
If you can't see it,
that's kind of the point.
1
5872
2253
00:20
(Laughter)
2
8149
1016
00:21
Before we understand how this is possible,
3
9189
3243
00:24
it's important to understand
the value of this feat.
4
12456
3746
00:29
All of our thoughts
and actions these days,
5
17075
2266
00:31
through photos and videos --
6
19365
1986
00:33
even our fitness activities --
7
21375
1879
00:35
are stored as digital data.
8
23278
2133
00:38
Aside from running out of space
9
26109
1517
00:39
on our phones,
10
27650
1151
00:40
we rarely think about
our digital footprint.
11
28825
2314
00:43
But humanity has collectively
generated more data
12
31536
3528
00:47
in the last few years
13
35088
1873
00:48
than all of preceding human history.
14
36985
2530
00:51
Big data has become a big problem.
15
39902
2898
00:55
Digital storage is really expensive,
16
43229
2817
00:58
and none of these devices that we have
really stand the test of time.
17
46070
3723
01:03
There's this nonprofit website
called the Internet Archive.
18
51256
3750
01:07
In addition to free books and movies,
19
55030
2645
01:09
you can access web pages
as far back as 1996.
20
57699
4364
01:14
Now, this is very tempting,
21
62087
1684
01:15
but I decided to go back and look at
the TED website's very humble beginnings.
22
63795
5989
01:21
As you can see, it's changed
quite a bit in the last 30 years.
23
69808
3912
01:26
So this led me to the first-ever TED,
24
74720
2824
01:29
back in 1984,
25
77568
2180
01:31
and it just so happened
to be a Sony executive
26
79772
2525
01:34
explaining how a compact disk works.
27
82321
3058
01:37
(Laughter)
28
85403
1079
01:38
Now, it's really incredible
to be able to go back in time
29
86506
4264
01:42
and access this moment.
30
90794
2286
01:45
It's also really fascinating
that after 30 years, after that first TED,
31
93548
5363
01:50
we're still talking about digital storage.
32
98935
2779
01:54
Now, if we look back another 30 years,
33
102827
2787
01:57
IBM released the first-ever hard drive
34
105638
3185
02:00
back in 1956.
35
108847
2127
02:02
Here it is being loaded for shipping
in front of a small audience.
36
110998
4197
02:07
It held the equivalent of one MP3 song
37
115569
3110
02:11
and weighed over one ton.
38
119354
2004
02:14
At 10,000 dollars a megabyte,
39
122100
2651
02:16
I don't think anyone in this room
would be interested in buying this thing,
40
124775
3587
02:20
except maybe as a collector's item.
41
128386
1760
02:22
But it's the best we could do at the time.
42
130817
2988
02:26
We've come such a long way
in data storage.
43
134832
3116
02:29
Devices have evolved dramatically.
44
137972
2898
02:32
But all media eventually wear out
or become obsolete.
45
140894
4024
02:37
If someone handed you a floppy drive today
to back up your presentation,
46
145401
4417
02:41
you'd probably look at them
kind of strange, maybe laugh,
47
149842
2940
02:44
but you'd have no way
to use the damn thing.
48
152806
2415
02:47
These devices can no longer meet
our storage needs,
49
155854
3141
02:51
although some of them can be repurposed.
50
159019
2702
02:54
All technology eventually dies or is lost,
51
162682
3109
02:57
along with our data,
52
165815
1851
02:59
all of our memories.
53
167690
1540
03:02
There's this illusion that
the storage problem has been solved,
54
170210
4116
03:06
but really, we all just externalize it.
55
174350
2493
03:08
We don't worry about storing
our emails and our photos.
56
176867
3477
03:12
They're just in the cloud.
57
180368
1723
03:15
But behind the scenes,
storage is problematic.
58
183231
2937
03:18
After all, the cloud is just
a lot of hard drives.
59
186192
3980
03:23
Now, most digital data,
we could argue, is not really critical.
60
191156
4040
03:27
Surely, we could just delete it.
61
195220
2123
03:29
But how can we really know
what's important today?
62
197957
3535
03:34
We've learned so much about human history
63
202132
2536
03:36
from drawings and writings in caves,
64
204692
2826
03:39
from stone tablets.
65
207542
1614
03:41
We've deciphered languages
from the Rosetta Stone.
66
209180
3397
03:45
You know, we'll never really have
the whole story, though.
67
213841
3609
03:49
Our data is our story,
68
217474
1894
03:51
even more so today.
69
219392
1735
03:53
We won't have our record
recorded on stone tablets.
70
221508
3261
03:57
But we don't have to choose
what is important now.
71
225692
2698
04:00
There's a way to store it all.
72
228847
1893
04:03
It turns out that there's
a solution that's been around
73
231519
2598
04:06
for a few billion years,
74
234141
2443
04:08
and it's actually in this tube.
75
236608
1840
04:12
DNA is nature's oldest storage device.
76
240044
3722
04:15
After all, it contains
all the information necessary
77
243790
3371
04:19
to build and maintain a human being.
78
247185
2830
04:22
But what makes DNA so great?
79
250583
2204
04:25
Well, let's take our own genome
80
253493
1756
04:27
as an example.
81
255273
1560
04:28
If we were to print out
all three billion A's, T's, C's and G's
82
256857
4770
04:33
on a standard font, standard format,
83
261651
3631
04:37
and then we were
to stack all of those papers,
84
265306
2740
04:40
it would be about 130 meters high,
85
268070
2660
04:42
somewhere between the Statue of Liberty
and the Washington Monument.
86
270754
3659
04:46
Now, if we converted
all those A's, T's, C's and G's
87
274437
2447
04:48
to digital data, to zeroes and ones,
88
276898
2556
04:51
it would total a few gigs.
89
279478
1769
04:53
And that's in each cell of our body.
90
281786
2339
04:56
We have more than 30 trillion cells.
91
284516
2838
04:59
You get the idea:
92
287757
1500
05:01
DNA can store a ton of information
in a minuscule space.
93
289281
4675
05:07
DNA is also very durable,
94
295620
1825
05:09
and it doesn't even require
electricity to store it.
95
297469
2834
05:12
We know this because scientists
have recovered DNA from ancient humans
96
300327
4276
05:16
that lived hundreds
of thousands of years ago.
97
304627
2752
05:19
One of those is Ötzi the Iceman.
98
307739
2627
05:22
Turns out, he's Austrian.
99
310390
1683
05:24
(Laughter)
100
312097
1600
05:25
He was found high, well-preserved,
101
313721
1630
05:27
in the mountains
between Italy and Austria,
102
315375
2814
05:30
and it turns out that he has living
genetic relatives here in Austria today.
103
318213
3984
05:34
So one of you could be a cousin of Ötzi.
104
322221
2342
05:36
(Laughter)
105
324587
1055
05:38
The point is that we have a better chance
of recovering information
106
326043
3853
05:41
from an ancient human
107
329920
1225
05:43
than we do from an old phone.
108
331169
2042
05:45
It's also much less likely
that we'll lose the ability to read DNA
109
333783
4645
05:50
than any single man-made device.
110
338452
2434
05:53
Every single new storage format
requires a new way to read it.
111
341567
4112
05:57
We'll always be able to read DNA.
112
345703
2133
05:59
If we can no longer sequence,
we have bigger problems
113
347860
3068
06:02
than worrying about data storage.
114
350952
2281
06:05
Storing data on DNA is not new.
115
353725
3071
06:08
Nature's been doing it
for several billion years.
116
356820
3099
06:11
In fact, every living thing
is a DNA storage device.
117
359943
3892
06:16
But how do we store data on DNA?
118
364397
2786
06:19
This is Photo 51.
119
367725
1791
06:21
It's the first-ever photo of DNA,
120
369540
2627
06:24
taken about 60 years ago.
121
372191
2252
06:26
This is around the time that
that same hard drive was released by IBM.
122
374467
4382
06:31
So really, our understanding of digital
storage and of DNA have coevolved.
123
379246
5492
06:37
We first learned to sequence, or read DNA,
124
385600
3316
06:40
and very soon after, how to write it,
125
388940
2012
06:42
or synthesize it.
126
390976
1559
06:44
This is much like how we learn
a new language.
127
392559
3564
06:48
And now we have the ability
to read, write and copy DNA.
128
396812
4613
06:53
We do it in the lab all the time.
129
401449
2080
06:56
So anything, really anything,
that can be stored as zeroes and ones
130
404283
3882
07:00
can be stored in DNA.
131
408189
1719
07:02
To store something digitally,
like this photo,
132
410579
3195
07:05
we convert it to bits, or binary digits.
133
413798
3306
07:09
Each pixel in a black-and-white photo
is simply a zero or a one.
134
417128
4211
07:13
And we can write DNA much like an inkjet
printer can print letters on a page.
135
421849
4824
07:18
We just have to convert our data,
all of those zeroes and ones,
136
426697
3824
07:22
to A's, T's, C's and G's,
137
430545
2138
07:24
and then we send this
to a synthesis company.
138
432707
2258
07:26
So we write it, we can store it,
139
434989
1947
07:28
and when we want to recover our data,
we just sequence it.
140
436960
3234
07:32
Now, the fun part of all of this
is deciding what files to include.
141
440218
4081
07:36
We're serious scientists,
so we had to include a manuscript
142
444323
3377
07:39
for good posterity.
143
447724
1743
07:41
We also included a $50 Amazon gift card --
144
449491
2833
07:44
don't get too excited, it's already
been spent, someone decoded it --
145
452348
3531
07:47
as well as an operating system,
146
455903
2210
07:50
one of the first movies ever made
147
458137
2371
07:52
and a Pioneer plaque.
148
460532
1738
07:54
Some of you might have seen this.
149
462294
1669
07:55
It has a depiction of a typical --
apparently -- male and female,
150
463987
3456
07:59
and our approximate location
in the Solar System,
151
467467
2562
08:02
in case the Pioneer spacecraft
ever encounters extraterrestrials.
152
470053
4002
08:06
So once we decided what sort of files
we want to encode,
153
474861
2929
08:09
we package up the data,
154
477814
1468
08:11
convert those zeroes and ones
to A's, T's, C's and G's,
155
479306
3654
08:14
and then we just send this file off
to a synthesis company.
156
482984
3277
08:18
And this is what we got back.
157
486285
1770
08:20
Our files were in this tube.
158
488079
1919
08:22
All we had to do was sequence it.
159
490022
2098
08:24
This all sounds pretty straightforward,
160
492525
2531
08:27
but the difference between
a really cool, fun idea
161
495080
2978
08:30
and something we can actually use
162
498082
2155
08:32
is overcoming these practical challenges.
163
500261
2496
08:35
Now, while DNA is more robust
than any man-made device,
164
503453
3972
08:39
it's not perfect.
165
507449
1285
08:40
It does have some weaknesses.
166
508758
1950
08:43
We recover our message
by sequencing the DNA,
167
511364
3431
08:46
and every time data is retrieved,
168
514819
2013
08:48
we lose the DNA.
169
516856
1786
08:50
That's just part
of the sequencing process.
170
518666
2414
08:53
We don't want to run out of data,
171
521104
1935
08:55
but luckily, there's a way to copy the DNA
172
523063
3096
08:58
that's even cheaper and easier
than synthesizing it.
173
526183
4585
09:03
We actually tested a way to make
200 trillion copies of our files,
174
531275
4858
09:08
and we recovered
all the data without error.
175
536157
2732
09:11
So sequencing also introduces
errors into our DNA,
176
539556
3867
09:15
into the A's, T's, C's and G's.
177
543447
2307
09:18
Nature has a way
to deal with this in our cells.
178
546135
2978
09:21
But our data is stored
in synthetic DNA in a tube,
179
549137
5890
09:27
so we had to find our own way
to overcome this problem.
180
555051
3252
09:30
We decided to use an algorithm
that was used to stream videos.
181
558724
4243
09:35
When you're streaming a video,
182
563452
1453
09:36
you're essentially trying to recover
the original video, the original file.
183
564929
4461
09:41
When we're trying to recover
our original files,
184
569414
2909
09:44
we're simply sequencing.
185
572347
1848
09:46
But really, both of these processes are
about recovering enough zeroes and ones
186
574219
4088
09:50
to put our data back together.
187
578331
1793
09:52
And so, because of our coding strategy,
188
580711
2041
09:54
we were able to package up all of our data
189
582776
2551
09:57
in a way that allowed us to make
millions and trillions of copies
190
585351
3772
10:01
and still always recover
all of our files back.
191
589147
2976
10:04
This is the movie we encoded.
192
592708
1750
10:06
It's one of the first movies ever made,
193
594482
2580
10:09
and now the first to be copied
more than 200 trillion times on DNA.
194
597086
4759
10:14
Soon after our work was published,
195
602377
2130
10:16
we participated in an "Ask Me Anything"
on the website reddit.
196
604531
3747
10:20
If you're a fellow nerd,
you're very familiar with this website.
197
608302
3175
10:23
Most questions were thoughtful.
198
611501
1945
10:25
Some were comical.
199
613470
1872
10:27
For example, one user wanted to know
when we would have a literal thumb drive.
200
615366
4128
10:32
Now, the thing is,
201
620091
2276
10:34
our DNA already stores everything
needed to make us who we are.
202
622391
4142
10:38
It's a lot safer to store data on DNA
203
626557
3818
10:42
in synthetic DNA in a tube.
204
630399
2821
10:46
Writing and reading data from DNA
is obviously a lot more time-consuming
205
634704
5426
10:52
than just saving all your files
on a hard drive --
206
640154
3095
10:55
for now.
207
643273
1291
10:57
So initially, we should focus
on long-term storage.
208
645159
3781
11:02
Most data are ephemeral.
209
650630
2310
11:04
It's really hard to grasp
what's important today,
210
652964
2588
11:07
or what will be important
for future generations.
211
655576
3252
11:10
But the point is,
we don't have to decide today.
212
658852
2563
11:14
There's this great program by UNESCO
called the "Memory of the World" program.
213
662065
4988
11:19
It's been created to preserve
historical materials
214
667077
3267
11:22
that are considered of value
to all of humanity.
215
670368
3127
11:26
Items are nominated
to be added to the collection,
216
674210
2977
11:29
including that film that we encoded.
217
677211
2255
11:32
While a wonderful way
to preserve human heritage,
218
680188
3582
11:35
it doesn't have to be a choice.
219
683794
1912
11:38
Instead of asking
the current generation -- us --
220
686088
3454
11:41
what might be important in the future,
221
689566
2222
11:43
we could store everything in DNA.
222
691812
2334
11:47
Storage is not just about how many bytes
223
695543
2440
11:50
but how well we can actually
store the data and recover it.
224
698007
3501
11:53
There's always been this tension
between how much data we can generate
225
701940
3431
11:57
and how much we can recover
226
705395
1715
11:59
and how much we can store.
227
707134
1769
12:01
Every advance in writing data
has required a new way to read it.
228
709841
4039
12:05
We can no longer read old media.
229
713904
2343
12:08
How many of you even have
a disk drive in your laptop,
230
716271
3741
12:12
never mind a floppy drive?
231
720036
1724
12:14
This will never be the case with DNA.
232
722151
2552
12:16
As long as we're around, DNA is around,
233
724727
3177
12:19
and we'll find a way to sequence it.
234
727928
2180
12:23
Archiving the world around us
is part of human nature.
235
731214
3459
12:27
This is the progress we've made
in digital storage in 60 years,
236
735172
4624
12:31
at a time when we were only
beginning to understand DNA.
237
739820
3376
12:35
Yet, we've made similar progress
in half that time with DNA sequencers,
238
743725
4845
12:40
and as long as we're around,
DNA will never be obsolete.
239
748594
4943
12:46
Thank you.
240
754107
1181
12:47
(Applause)
241
755312
4981

▲Back to top

ABOUT THE SPEAKER
Dina Zielinski - Bench scientist, bioinformatician
Dina Zielinski brings biological data to life, from decoding mutations in cancer to encoding data in DNA.

Why you should listen

Dina Zielinski is a bench scientist and bioinformatician with broad experience in genetics and genomics. Her current projects are in developmental biology and neurology, but she is motivated to keep learning and using her skills to answer scientific questions that challenge our deepest knowledge.

More profile about the speaker
Dina Zielinski | Speaker | TED.com