ABOUT THE SPEAKER
Mona Chalabi - Data journalist
Mona Chalabi tries to take the numb out of numbers. She's left with lots of "ers."

Why you should listen

After working for a humanitarian organisation, Mona Chalabi saw how important data was, but also how easily it could be used by people with their own specific agendas. Since then, her work for organizations like Transparency International and The Guardian has had one goal: to make sure as many people as possible can find and question the data they need to make informed decisions about their lives.

Chalabi is currently the Data Editor of the Guardian US, where she writes articles, produces documentaries and turns data into illustrations and animations. In 2016, her data illustrations were commended by the Royal Statistical Society.

More profile about the speaker
Mona Chalabi | Speaker | TED.com
TEDNYC

Mona Chalabi: 3 ways to spot a bad statistic

Filmed:
1,888,599 views

Sometimes it's hard to know what statistics are worthy of trust. But we shouldn't count out stats altogether ... instead, we should learn to look behind them. In this delightful, hilarious talk, data journalist Mona Chalabi shares handy tips to help question, interpret and truly understand what the numbers are saying.
- Data journalist
Mona Chalabi tries to take the numb out of numbers. She's left with lots of "ers." Full bio

Double-click the English transcript below to play the video.

00:12
Now, I'm going to be talking
about statistics today.
0
884
2763
00:15
If that makes you immediately feel
a little bit wary, that's OK,
1
3671
3138
00:18
that doesn't make you some
kind of crazy conspiracy theorist,
2
6833
2859
00:21
it makes you skeptical.
3
9716
1296
00:23
And when it comes to numbers,
especially now, you should be skeptical.
4
11036
3886
00:26
But you should also be able to tell
which numbers are reliable
5
14946
3011
00:29
and which ones aren't.
6
17981
1160
00:31
So today I want to try to give you
some tools to be able to do that.
7
19165
3206
00:34
But before I do,
8
22395
1169
00:35
I just want to clarify which numbers
I'm talking about here.
9
23588
2839
00:38
I'm not talking about claims like,
10
26451
1635
00:40
"9 out of 10 women recommend
this anti-aging cream."
11
28110
2449
00:42
I think a lot of us always
roll our eyes at numbers like that.
12
30583
2972
00:45
What's different now is people
are questioning statistics like,
13
33579
2984
00:48
"The US unemployment
rate is five percent."
14
36587
2014
00:50
What makes this claim different is
it doesn't come from a private company,
15
38625
3516
00:54
it comes from the government.
16
42165
1388
00:55
About 4 out of 10 Americans
distrust the economic data
17
43577
3336
00:58
that gets reported by government.
18
46937
1573
01:00
Among supporters of President Trump
it's even higher;
19
48534
2491
01:03
it's about 7 out of 10.
20
51049
1633
01:04
I don't need to tell anyone here
21
52706
1804
01:06
that there are a lot of dividing lines
in our society right now,
22
54534
3011
01:09
and a lot of them start to make sense,
23
57569
1825
01:11
once you understand people's relationships
with these government numbers.
24
59418
3687
01:15
On the one hand, there are those who say
these statistics are crucial,
25
63129
3336
01:18
that we need them to make sense
of society as a whole
26
66489
2630
01:21
in order to move beyond
emotional anecdotes
27
69143
2164
01:23
and measure progress in a subjective way.
28
71331
2410
01:25
And then there are the others,
29
73765
1467
01:27
who say that these statistics are elitist,
30
75256
2156
01:29
maybe even rigged;
31
77436
1208
01:30
they don't make sense
and they don't really reflect
32
78668
2394
01:33
what's happening
in people's everyday lives.
33
81086
2296
01:35
It kind of feels like that second group
is winning the argument right now.
34
83406
3487
01:38
We're living in a world
of alternative facts,
35
86917
2108
01:41
where people don't find statistics
this kind of common ground,
36
89049
2935
01:44
this starting point for debate.
37
92008
1636
01:45
This is a problem.
38
93668
1286
01:46
There are actually
moves in the US right now
39
94978
2067
01:49
to get rid of some government
statistics altogether.
40
97069
2861
01:51
Right now there's a bill in congress
about measuring racial inequality.
41
99954
3387
01:55
The draft law says that government
money should not be used
42
103365
2801
01:58
to collect data on racial segregation.
43
106190
1902
02:00
This is a total disaster.
44
108116
1885
02:02
If we don't have this data,
45
110025
1748
02:03
how can we observe discrimination,
46
111797
1778
02:05
let alone fix it?
47
113599
1278
02:06
In other words:
48
114901
1188
02:08
How can a government create fair policies
49
116113
2059
02:10
if they can't measure
current levels of unfairness?
50
118196
2771
02:12
This isn't just about discrimination,
51
120991
1794
02:14
it's everything -- think about it.
52
122809
1670
02:16
How can we legislate on health care
53
124503
1690
02:18
if we don't have good data
on health or poverty?
54
126217
2271
02:20
How can we have public debate
about immigration
55
128512
2198
02:22
if we can't at least agree
56
130734
1250
02:24
on how many people are entering
and leaving the country?
57
132008
2643
02:26
Statistics come from the state;
that's where they got their name.
58
134675
3058
02:29
The point was to better
measure the population
59
137757
2157
02:31
in order to better serve it.
60
139938
1357
02:33
So we need these government numbers,
61
141319
1725
02:35
but we also have to move
beyond either blindly accepting
62
143068
2647
02:37
or blindly rejecting them.
63
145739
1268
02:39
We need to learn the skills
to be able to spot bad statistics.
64
147031
2997
02:42
I started to learn some of these
65
150052
1528
02:43
when I was working
in a statistical department
66
151604
2166
02:45
that's part of the United Nations.
67
153794
1643
02:47
Our job was to find out how many Iraqis
had been forced from their homes
68
155461
3406
02:50
as a result of the war,
69
158891
1158
02:52
and what they needed.
70
160073
1158
02:53
It was really important work,
but it was also incredibly difficult.
71
161255
3178
02:56
Every single day, we were making decisions
72
164457
2018
02:58
that affected the accuracy
of our numbers --
73
166499
2157
03:00
decisions like which parts
of the country we should go to,
74
168680
2744
03:03
who we should speak to,
75
171448
1156
03:04
which questions we should ask.
76
172628
1568
03:06
And I started to feel
really disillusioned with our work,
77
174220
2680
03:08
because we thought we were doing
a really good job,
78
176924
2518
03:11
but the one group of people
who could really tell us were the Iraqis,
79
179466
3278
03:14
and they rarely got the chance to find
our analysis, let alone question it.
80
182768
3540
03:18
So I started to feel really determined
81
186332
1831
03:20
that the one way to make
numbers more accurate
82
188187
2311
03:22
is to have as many people as possible
be able to question them.
83
190522
3053
03:25
So I became a data journalist.
84
193599
1434
03:27
My job is finding these data sets
and sharing them with the public.
85
195057
3904
03:30
Anyone can do this,
you don't have to be a geek or a nerd.
86
198985
3173
03:34
You can ignore those words;
they're used by people
87
202182
2355
03:36
trying to say they're smart
while pretending they're humble.
88
204561
2822
03:39
Absolutely anyone can do this.
89
207407
1589
03:41
I want to give you guys three questions
90
209020
2067
03:43
that will help you be able to spot
some bad statistics.
91
211111
3005
03:46
So, question number one
is: Can you see uncertainty?
92
214140
3507
03:49
One of things that's really changed
people's relationship with numbers,
93
217671
3364
03:53
and even their trust in the media,
94
221059
1641
03:54
has been the use of political polls.
95
222724
2258
03:57
I personally have a lot of issues
with political polls
96
225006
2538
03:59
because I think the role of journalists
is actually to report the facts
97
227568
3376
04:02
and not attempt to predict them,
98
230968
1553
04:04
especially when those predictions
can actually damage democracy
99
232545
2996
04:07
by signaling to people:
don't bother to vote for that guy,
100
235565
2732
04:10
he doesn't have a chance.
101
238321
1205
04:11
Let's set that aside for now and talk
about the accuracy of this endeavor.
102
239550
3654
04:15
Based on national elections
in the UK, Italy, Israel
103
243228
4608
04:19
and of course, the most recent
US presidential election,
104
247860
2764
04:22
using polls to predict electoral outcomes
105
250648
2137
04:24
is about as accurate as using the moon
to predict hospital admissions.
106
252809
3812
04:28
No, seriously, I used actual data
from an academic study to draw this.
107
256645
4200
04:32
There are a lot of reasons why
polling has become so inaccurate.
108
260869
3727
04:36
Our societies have become really diverse,
109
264620
1970
04:38
which makes it difficult for pollsters
to get a really nice representative sample
110
266614
3821
04:42
of the population for their polls.
111
270459
1627
04:44
People are really reluctant to answer
their phones to pollsters,
112
272110
3006
04:47
and also, shockingly enough,
people might lie.
113
275140
2276
04:49
But you wouldn't necessarily
know that to look at the media.
114
277440
2811
04:52
For one thing, the probability
of a Hillary Clinton win
115
280275
2761
04:55
was communicated with decimal places.
116
283060
2791
04:57
We don't use decimal places
to describe the temperature.
117
285875
2621
05:00
How on earth can predicting the behavior
of 230 million voters in this country
118
288520
4228
05:04
be that precise?
119
292772
1829
05:06
And then there were those sleek charts.
120
294625
2002
05:08
See, a lot of data visualizations
will overstate certainty, and it works --
121
296651
3973
05:12
these charts can numb
our brains to criticism.
122
300648
2620
05:15
When you hear a statistic,
you might feel skeptical.
123
303292
2558
05:17
As soon as it's buried in a chart,
124
305874
1635
05:19
it feels like some kind
of objective science,
125
307533
2129
05:21
and it's not.
126
309686
1249
05:22
So I was trying to find ways
to better communicate this to people,
127
310959
3103
05:26
to show people the uncertainty
in our numbers.
128
314086
2504
05:28
What I did was I started taking
real data sets,
129
316614
2246
05:30
and turning them into
hand-drawn visualizations,
130
318884
2652
05:33
so that people can see
how imprecise the data is;
131
321560
2672
05:36
so people can see that a human did this,
132
324256
1996
05:38
a human found the data and visualized it.
133
326276
1972
05:40
For example, instead
of finding out the probability
134
328272
2672
05:42
of getting the flu in any given month,
135
330968
2126
05:45
you can see the rough
distribution of flu season.
136
333118
2792
05:47
This is --
137
335934
1167
05:49
(Laughter)
138
337125
1018
05:50
a bad shot to show in February.
139
338167
1486
05:51
But it's also more responsible
data visualization,
140
339677
2455
05:54
because if you were to show
the exact probabilities,
141
342156
2455
05:56
maybe that would encourage
people to get their flu jabs
142
344635
2592
05:59
at the wrong time.
143
347251
1456
06:01
The point of these shaky lines
144
349163
1693
06:02
is so that people remember
these imprecisions,
145
350880
2911
06:05
but also so they don't necessarily
walk away with a specific number,
146
353815
3227
06:09
but they can remember important facts.
147
357066
1866
06:10
Facts like injustice and inequality
leave a huge mark on our lives.
148
358956
4024
06:15
Facts like Black Americans and Native
Americans have shorter life expectancies
149
363004
4189
06:19
than those of other races,
150
367217
1400
06:20
and that isn't changing anytime soon.
151
368641
2138
06:22
Facts like prisoners in the US
can be kept in solitary confinement cells
152
370803
3901
06:26
that are smaller than the size
of an average parking space.
153
374728
3342
06:30
The point of these visualizations
is also to remind people
154
378535
3335
06:33
of some really important
statistical concepts,
155
381894
2350
06:36
concepts like averages.
156
384268
1636
06:37
So let's say you hear a claim like,
157
385928
1668
06:39
"The average swimming pool in the US
contains 6.23 fecal accidents."
158
387620
4434
06:44
That doesn't mean every single
swimming pool in the country
159
392078
2797
06:46
contains exactly 6.23 turds.
160
394899
2194
06:49
So in order to show that,
161
397117
1417
06:50
I went back to the original data,
which comes from the CDC,
162
398558
2841
06:53
who surveyed 47 swimming facilities.
163
401423
2065
06:55
And I just spent one evening
redistributing poop.
164
403512
2391
06:57
So you can kind of see
how misleading averages can be.
165
405927
2682
07:00
(Laughter)
166
408633
1282
07:01
OK, so the second question
that you guys should be asking yourselves
167
409939
3901
07:05
to spot bad numbers is:
168
413864
1501
07:07
Can I see myself in the data?
169
415389
1967
07:09
This question is also
about averages in a way,
170
417380
2913
07:12
because part of the reason
why people are so frustrated
171
420317
2605
07:14
with these national statistics,
172
422946
1495
07:16
is they don't really tell the story
of who's winning and who's losing
173
424465
3273
07:19
from national policy.
174
427762
1156
07:20
It's easy to understand why people
are frustrated with global averages
175
428942
3318
07:24
when they don't match up
with their personal experiences.
176
432284
2679
07:26
I wanted to show people the way
data relates to their everyday lives.
177
434987
3263
07:30
I started this advice column
called "Dear Mona,"
178
438274
2246
07:32
where people would write to me
with questions and concerns
179
440544
2726
07:35
and I'd try to answer them with data.
180
443294
1784
07:37
People asked me anything.
181
445102
1200
07:38
questions like, "Is it normal to sleep
in a separate bed to my wife?"
182
446326
3261
07:41
"Do people regret their tattoos?"
183
449611
1591
07:43
"What does it mean to die
of natural causes?"
184
451226
2164
07:45
All of these questions are great,
because they make you think
185
453414
2966
07:48
about ways to find
and communicate these numbers.
186
456404
2336
07:50
If someone asks you,
"How much pee is a lot of pee?"
187
458764
2503
07:53
which is a question that I got asked,
188
461291
2458
07:55
you really want to make sure
that the visualization makes sense
189
463773
2980
07:58
to as many people as possible.
190
466777
1747
08:00
These numbers aren't unavailable.
191
468548
1575
08:02
Sometimes they're just buried
in the appendix of an academic study.
192
470147
3507
08:05
And they're certainly not inscrutable;
193
473678
1839
08:07
if you really wanted to test
these numbers on urination volume,
194
475541
2975
08:10
you could grab a bottle
and try it for yourself.
195
478540
2257
08:12
(Laughter)
196
480821
1008
08:13
The point of this isn't necessarily
197
481853
1694
08:15
that every single data set
has to relate specifically to you.
198
483571
2877
08:18
I'm interested in how many women
were issued fines in France
199
486472
2880
08:21
for wearing the face veil, or the niqab,
200
489376
1959
08:23
even if I don't live in France
or wear the face veil.
201
491359
2618
08:26
The point of asking where you fit in
is to get as much context as possible.
202
494001
3835
08:29
So it's about zooming out
from one data point,
203
497860
2191
08:32
like the unemployment rate
is five percent,
204
500075
2104
08:34
and seeing how it changes over time,
205
502203
1757
08:35
or seeing how it changes
by educational status --
206
503984
2650
08:38
this is why your parents always
wanted you to go to college --
207
506658
3104
08:41
or seeing how it varies by gender.
208
509786
2032
08:43
Nowadays, male unemployment rate is higher
209
511842
2127
08:45
than the female unemployment rate.
210
513993
1700
08:47
Up until the early '80s,
it was the other way around.
211
515717
2695
08:50
This is a story of one
of the biggest changes
212
518436
2117
08:52
that's happened in American society,
213
520577
1720
08:54
and it's all there in that chart,
once you look beyond the averages.
214
522321
3276
08:57
The axes are everything;
215
525621
1165
08:58
once you change the scale,
you can change the story.
216
526810
2669
09:01
OK, so the third and final question
that I want you guys to think about
217
529503
3380
09:04
when you're looking at statistics is:
218
532907
1819
09:06
How was the data collected?
219
534750
1873
09:09
So far, I've only talked about the way
data is communicated,
220
537667
2939
09:12
but the way it's collected
matters just as much.
221
540630
2276
09:14
I know this is tough,
222
542930
1167
09:16
because methodologies can be opaque
and actually kind of boring,
223
544121
3081
09:19
but there are some simple steps
you can take to check this.
224
547226
2873
09:22
I'll use one last example here.
225
550123
1839
09:24
One poll found that 41 percent of Muslims
in this country support jihad,
226
552309
3887
09:28
which is obviously pretty scary,
227
556220
1525
09:29
and it was reported everywhere in 2015.
228
557769
2642
09:32
When I want to check a number like that,
229
560435
2615
09:35
I'll start off by finding
the original questionnaire.
230
563074
2501
09:37
It turns out that journalists
who reported on that statistic
231
565599
2926
09:40
ignored a question
lower down on the survey
232
568549
2231
09:42
that asked respondents
how they defined "jihad."
233
570804
2346
09:45
And most of them defined it as,
234
573174
1981
09:47
"Muslims' personal, peaceful struggle
to be more religious."
235
575179
3942
09:51
Only 16 percent defined it as,
"violent holy war against unbelievers."
236
579145
4194
09:55
This is the really important point:
237
583363
2430
09:57
based on those numbers,
it's totally possible
238
585817
2155
09:59
that no one in the survey
who defined it as violent holy war
239
587996
3105
10:03
also said they support it.
240
591125
1332
10:04
Those two groups might not overlap at all.
241
592481
2208
10:07
It's also worth asking
how the survey was carried out.
242
595122
2637
10:09
This was something called an opt-in poll,
243
597783
1998
10:11
which means anyone could have found it
on the internet and completed it.
244
599805
3402
10:15
There's no way of knowing
if those people even identified as Muslim.
245
603231
3339
10:18
And finally, there were 600
respondents in that poll.
246
606594
2612
10:21
There are roughly three million
Muslims in this country,
247
609230
2654
10:23
according to Pew Research Center.
248
611908
1607
10:25
That means the poll spoke to roughly
one in every 5,000 Muslims
249
613539
2993
10:28
in this country.
250
616556
1168
10:29
This is one of the reasons
251
617748
1266
10:31
why government statistics
are often better than private statistics.
252
619038
3607
10:34
A poll might speak to a couple
hundred people, maybe a thousand,
253
622669
3035
10:37
or if you're L'Oreal, trying to sell
skin care products in 2005,
254
625728
3058
10:40
then you spoke to 48 women
to claim that they work.
255
628810
2417
10:43
(Laughter)
256
631251
1026
10:44
Private companies don't have a huge
interest in getting the numbers right,
257
632301
3556
10:47
they just need the right numbers.
258
635881
1755
10:49
Government statisticians aren't like that.
259
637660
2020
10:51
In theory, at least,
they're totally impartial,
260
639704
2447
10:54
not least because most of them do
their jobs regardless of who's in power.
261
642175
3501
10:57
They're civil servants.
262
645700
1162
10:58
And to do their jobs properly,
263
646886
1964
11:00
they don't just speak
to a couple hundred people.
264
648874
2363
11:03
Those unemployment numbers
I keep on referencing
265
651261
2318
11:05
come from the Bureau of Labor Statistics,
266
653603
2004
11:07
and to make their estimates,
267
655631
1335
11:08
they speak to over 140,000
businesses in this country.
268
656990
3489
11:12
I get it, it's frustrating.
269
660503
1725
11:14
If you want to test a statistic
that comes from a private company,
270
662252
3115
11:17
you can buy the face cream for you
and a bunch of friends, test it out,
271
665391
3361
11:20
if it doesn't work,
you can say the numbers were wrong.
272
668776
2591
11:23
But how do you question
government statistics?
273
671391
2146
11:25
You just keep checking everything.
274
673561
1630
11:27
Find out how they collected the numbers.
275
675215
1913
11:29
Find out if you're seeing everything
on the chart you need to see.
276
677152
3125
11:32
But don't give up on the numbers
altogether, because if you do,
277
680301
2965
11:35
we'll be making public policy
decisions in the dark,
278
683290
2439
11:37
using nothing but private
interests to guide us.
279
685753
2262
11:40
Thank you.
280
688039
1166
11:41
(Applause)
281
689229
2461
Translated by Leslie Gauthier
Reviewed by Camille Martínez

▲Back to top

ABOUT THE SPEAKER
Mona Chalabi - Data journalist
Mona Chalabi tries to take the numb out of numbers. She's left with lots of "ers."

Why you should listen

After working for a humanitarian organisation, Mona Chalabi saw how important data was, but also how easily it could be used by people with their own specific agendas. Since then, her work for organizations like Transparency International and The Guardian has had one goal: to make sure as many people as possible can find and question the data they need to make informed decisions about their lives.

Chalabi is currently the Data Editor of the Guardian US, where she writes articles, produces documentaries and turns data into illustrations and animations. In 2016, her data illustrations were commended by the Royal Statistical Society.

More profile about the speaker
Mona Chalabi | Speaker | TED.com