ABOUT THE SPEAKER
Ben Wellington - Data scientist
Ben Wellington blends his love of statistics, the city, and comedy in his entertaining analysis of the story of New York City, told through data.

Why you should listen

Ben Wellington runs the I Quant NY blog, in which he crunches city-released data to find out what's really going on in the Big Apple. To date he has tackled topics such as measles outbreaks in New York City schools, analyzed how companies like Airbnb are really doing in NYC, and asked questions such as "does gentrification cause a reduction in laundromats?" (Answer: inconclusive.)

Ben is a visiting assistant professor in the City & Regional Planning program at the Pratt Institute in Brooklyn; his day job involves working as a quantitative analyst at the investment management firm, Two Sigma. A budding comedian and performer, he also teaches team building workshops through Cherub Improv, a non-profit that uses improv comedy for social good.

More profile about the speaker
Ben Wellington | Speaker | TED.com
TEDxNewYork

Ben Wellington: How we found the worst place to park in New York City -- using big data

本▪威靈頓: 如何找到紐約市最差的停車位?——用大數據說話

Filmed:
1,055,247 views

市政機關擁有反映城市生活方方面面的龐大統計數據。但是,正如數據分析師本▪威靈頓在他有趣的演講中提到的,政府經常不知道如何使用這些數據。他展示了如何結合出乎意料的問題和機智的數據處理,從而產生奇特而有用的洞見,并分享了如何分享大規模數據庫供民眾使用的相關資訊。
- Data scientist
Ben Wellington blends his love of statistics, the city, and comedy in his entertaining analysis of the story of New York City, told through data. Full bio

Double-click the English transcript below to play the video.

00:12
Six thousand miles英里 of road,
0
711
2820
六千英里公路,
00:15
600 miles英里 of subway地鐵 track跟踪,
1
3531
2203
六百英里地鐵路線,
00:17
400 miles英里 of bike自行車 lanes車道
2
5734
1644
四百英里腳踏車專用道,
00:19
and a half a mile英里 of tram電車 track跟踪,
3
7378
1821
半英里的有軌電車專用道
00:21
if you've ever been to Roosevelt羅斯福 Island.
4
9199
1953
僅在羅斯福島。
00:23
These are the numbers數字 that make up
the infrastructure基礎設施 of New York紐約 City.
5
11152
3334
這些數字構成了紐約市的基建。
這些基建的統計數字,
00:26
These are the statistics統計
of our infrastructure基礎設施.
6
14486
2619
都可以在市政機關公佈的報告中找到。
00:29
They're the kind of numbers數字 you can find
released發布 in reports報告 by city agencies機構.
7
17105
3706
譬如,交通部門可能會告訴你,
00:32
For example, the Department
of Transportation運輸 will probably大概 tell you
8
20811
3199
他們維護這多少英里的道路。
MTA(紐約交通運輸管理局)會自誇
他們掌管著多少英里捷運。
00:36
how many許多 miles英里 of road they maintain保持.
9
24010
1781
00:37
The MTAMTA will boast how many許多 miles英里
of subway地鐵 track跟踪 there are.
10
25791
2821
多數的市政機關都在公佈統計數據。
00:40
Most city agencies機構 give us statistics統計.
11
28612
1807
這是今年計程車與轎車委員會發佈的報告,
00:42
This is from a report報告 this year
12
30419
1483
00:43
from the Taxi出租車 and Limousine豪華轎車 Commission佣金,
13
31902
1892
我們從中知道紐約市運營著
大約一萬三千五百輛計程車。
00:45
where we learn學習 that there's about
13,500 taxis出租車 here in New York紐約 City.
14
33794
3276
很有趣,是嗎?
00:49
Pretty漂亮 interesting有趣, right?
15
37070
1290
但你有否想過這些數據來自哪裡?
00:50
But did you ever think about
where these numbers數字 came來了 from?
16
38360
2784
既然有這些數字存在,
那肯定是因為在市政機關的某個人
00:53
Because for these numbers數字 to exist存在,
someone有人 at the city agency機構
17
41144
2903
00:56
had to stop and say, hmm, here's這裡的 a number
that somebody might威力 want want to know.
18
44047
3880
想過:嗯......這個數字可能有人會想知道。
這個數字是市民們想知道的。
00:59
Here's這裡的 a number
that our citizens公民 want to know.
19
47927
2250
所以他們找回那些原始數據,
01:02
So they go back to their raw生的 data數據,
20
50177
1830
他們計數、相加、計算,
01:04
they count計數, they add, they calculate計算,
21
52007
1797
然後把得出的結果寫進報告中,
01:05
and then they put out reports報告,
22
53804
1467
所以那些報告中會有這樣的數字。
01:07
and those reports報告
will have numbers數字 like this.
23
55271
2177
那麼問題來了:他們怎麼會知道
我們的問題都是什麼?
01:09
The problem問題 is, how do they know
all of our questions問題?
24
57448
2540
我們有很多問題。
01:11
We have lots of questions問題.
25
59988
1243
01:13
In fact事實, in some ways方法 there's literally按照字面
an infinite無窮 number of questions問題
26
61231
3340
事實上,可以說我們有無窮無盡的問題
有關我們這座城市。
01:16
that we can ask about our city.
27
64571
1649
市政機關可無法跟得上(我們的節奏)。
01:18
The agencies機構 can never keep up.
28
66220
1475
01:19
So the paradigm範例 isn't exactly究竟 working加工,
and I think our policymakers政策制定者 realize實現 that,
29
67695
4056
現有模式並不具有實效,我覺得
我們的政策制定者也知道這點,
01:23
because in 2012, Mayor市長 Bloomberg彭博
signed into law what he called
30
71751
3959
因為在2012年,彭博市長
簽署了一個法令,他稱之為
01:27
the most ambitious有雄心 and comprehensive全面
open打開 data數據 legislation立法 in the country國家.
31
75710
3837
全美最具雄心和綜合性的
開放數據立法。
01:31
In a lot of ways方法, he's right.
32
79547
1573
從各種意義上來說,他是對的。
01:33
In the last two years年份,
the city has released發布 1,000 datasets數據集
33
81120
2861
在過去兩年中,市政有1000個數據庫
01:35
on our open打開 data數據 portal門戶,
34
83981
1610
放在我們的開放數據門戶網站上,
01:37
and it's pretty漂亮 awesome真棒.
35
85591
1764
還是蠻驚人的。
01:39
So you go and look at data數據 like this,
36
87355
1968
我們來檢視這些數據,
01:41
and instead代替 of just counting數數
the number of cabs出租車,
37
89323
2289
除了數數計程車的數量,
01:43
we can start開始 to ask different不同 questions問題.
38
91612
1943
我們也能開始問不一樣的問題了。
01:45
So I had a question.
39
93555
1200
我有一個問題:
01:46
When's當的 rush hour小時 in New York紐約 City?
40
94755
1701
紐約市的交通高峰在什麼時候?
01:48
It can be pretty漂亮 bothersome傷腦筋.
When is rush hour小時 exactly究竟?
41
96456
2581
這簡直煩人。高峰到底是什麼時候?
01:51
And I thought to myself,
these cabs出租車 aren't just numbers數字,
42
99037
2625
我想到,這些計程車可不僅僅是個數字,
01:53
these are GPS全球定位系統 recorders錄像機
driving主動 around in our city streets街道
43
101662
2711
它們可以是開遍全市道路的GPS記錄儀,
01:56
recording記錄 each and every一切 ride they take.
44
104373
1913
記錄著乘客的每一差車程。
01:58
There's data數據 there,
and I looked看著 at that data數據,
45
106286
2322
數據是現成的。我檢視它們,
02:00
and I made製作 a plot情節 of the average平均 speed速度 of
taxis出租車 in New York紐約 City throughout始終 the day.
46
108608
3961
並制出一張圖表,標出
一天中紐約市計程車的平均時速。
02:04
You can see that from about midnight午夜
to around 5:18 in the morning早上,
47
112569
3412
大家可以看到,
從半夜到凌晨五點十八分,
02:07
speed速度 increases增加, and at that point,
things turn around,
48
115981
3563
時速一直在增加,然後到了拐點,
02:11
and they get slower比較慢 and slower比較慢 and slower比較慢
until直到 about 8:35 in the morning早上,
49
119544
3962
時速逐漸下降,在早間的八點三十五分,
02:15
when they end結束 up at around
11 and a half miles英里 per hour小時.
50
123506
2693
時速降到十一英里半。
02:18
The average平均 taxi出租車 is going 11 and a half
miles英里 per hour小時 on our city streets街道,
51
126199
3562
運營中計程車的平均時速
保持在十一英里半,
02:21
and it turns out it stays入住 that way
52
129761
1987
結果沒有變化,
02:23
for the entire整個 day.
53
131748
3368
整天都是如此。
02:27
(Laughter笑聲)
54
135116
1373
(笑聲)
02:28
So I said to myself, I guess猜測
there's no rush hour小時 in New York紐約 City.
55
136489
3180
我告訴自己,紐約市並不存在高峰時段,
而是全天都高峰。
02:31
There's just a rush day.
56
139669
1537
02:33
Makes使 sense. And this is important重要
for a couple一對 of reasons原因.
57
141206
2850
這是個有意義的結論,原因有幾點。
02:36
If you're a transportation運輸 planner規劃人員,
this might威力 be pretty漂亮 interesting有趣 to know.
58
144056
3637
如果你是做交通規劃的,
知道這個結論會有意義。
02:39
But if you want to get somewhere某處 quickly很快,
59
147693
1975
如果你要快速到達某地,
02:41
you now know to set your alarm報警 for
4:45 in the morning早上 and you're all set.
60
149668
3468
只要把鬧鐘定在凌晨四點四十五分就行了。
紐約嘛!
02:45
New York紐約, right?
61
153136
1044
但這個數據背後還有故事。
02:46
But there's a story故事 behind背後 this data數據.
62
154180
1762
這個數據並不真的是現成的。
02:47
This data數據 wasn't
just available可得到, it turns out.
63
155942
2185
你需要做一個「信息自由法案申請」,
02:50
It actually其實 came來了 from something called
a Freedom自由 of Information信息 Law Request請求,
64
158127
3619
也叫「FOIL申請」。
02:53
or a FOIL挫敗 Request請求.
65
161746
1076
你可以在計程車和轎車委員會的網站上
找到相關申請表。
02:54
This is a form形成 you can find on the
Taxi出租車 and Limousine豪華轎車 Commission佣金 website網站.
66
162822
3466
如果要獲得這些數據,
你要弄到這張申請表,
02:58
In order訂購 to access訪問 this data數據,
you need to go get this form形成,
67
166288
2826
填好上交,受理人員屆時會通知你。
03:01
fill it out, and they will notify通知 you,
68
169114
1846
一個叫克里斯▪旺的人就這樣做了。
03:02
and a guy named命名 Chris克里斯 WhongWhong
did exactly究竟 that.
69
170960
2130
克里斯來到委員會,工作人員告訴他
03:05
Chris克里斯 went down, and they told him,
70
173090
1890
03:06
"Just bring帶來 a brand new hard drive駕駛
down to our office辦公室,
71
174980
2827
「帶個全新的硬盤來辦公室,
我們會把相關數據拷貝給你,
過五小時來拿。」
03:09
leave離開 it here for five hours小時,
we'll copy複製 the data數據 and you take it back."
72
177807
3424
03:13
And that's where this data數據 came來了 from.
73
181231
2032
這就是拿到數據的經過。
03:15
Now, Chris克里斯 is the kind of guy
who wants to make the data數據 public上市,
74
183263
3005
克里斯想公開這些數據,
於是放到網路上供所有人使用,
所以我才能做出這張圖。
03:18
and so it ended結束 up online線上 for all to use,
and that's where this graph圖形 came來了 from.
75
186268
3784
這一切——這些GPS記錄儀真是酷。
03:22
And the fact事實 that it exists存在 is amazing驚人.
These GPS全球定位系統 recorders錄像機 -- really cool.
76
190052
3518
03:25
But the fact事實 that we have citizens公民
walking步行 around with hard drives驅動器
77
193570
3118
但是,市民要攜帶自己的移動硬盤
踏遍市政機關,
然後通過自己的努力公開,這件事——
03:28
picking選擇 up data數據 from city agencies機構
to make it public上市 --
78
196688
2582
03:31
it was already已經 kind of public上市,
you could get to it,
79
199270
2390
政府數據可以說是公開的,
普通市民能得到它,
03:33
but it was "public上市," it wasn't public上市.
80
201660
1812
但這只是名義上的「公開」,
並不是真正的公開。
我們的城市可以做得更好。
03:35
And we can do better than that as a city.
81
203472
1962
我們不需要費力帶著移動硬盤到處跑。
03:37
We don't need our citizens公民
walking步行 around with hard drives驅動器.
82
205434
2756
並不是每一個數據庫都需要FOIL申請。
03:40
Now, not every一切 dataset數據集
is behind背後 a FOIL挫敗 Request請求.
83
208190
2337
03:42
Here is a map地圖 I made製作 with the most
dangerous危險 intersections十字路口 in New York紐約 City
84
210527
3802
我做的這張地圖標出了紐約市最危險的路口,
03:46
based基於 on cyclist騎車人 accidents事故.
85
214329
1878
來源是腳踏車騎行者的交通事故數據。
03:48
So the red areas are more dangerous危險.
86
216207
1939
紅色區域更危險,
03:50
And what it shows節目 is first
the East side of Manhattan曼哈頓,
87
218146
2553
圖上顯示,首先,曼哈頓的東側,
03:52
especially特別 in the lower降低 area of Manhattan曼哈頓,
has more cyclist騎車人 accidents事故.
88
220699
3611
特別是曼哈頓的下城區域,
腳踏車事故更多。
03:56
That might威力 make sense
89
224310
1019
這可能是因為,
03:57
because there are more cyclists騎自行車的人
coming未來 off the bridges橋樑 there.
90
225329
2896
在這裡有更多的騎行者從大橋下來。
04:00
But there's other hotspots熱點 worth價值 studying研究.
91
228225
2014
圖上還有其他的熱點區域值得研究。
威廉姆斯堡、皇后區的羅斯福大道,
04:02
There's Williamsburg威廉斯堡.
There's Roosevelt羅斯福 Avenue大街 in Queens皇后.
92
230239
2669
04:04
And this is exactly究竟 the kind of data數據
we need for Vision視力 Zero.
93
232908
2852
這些咨詢才是Vision Zero項目所需要的。
這正是我們要找的東西。
04:07
This is exactly究竟 what we're looking for.
94
235760
1990
04:09
But there's a story故事
behind背後 this data數據 as well.
95
237750
2135
這個數據背後也有個故事。
04:11
This data數據 didn't just appear出現.
96
239885
2067
這個數據並不是現成的。
04:13
How many許多 of you guys know this logo商標?
97
241952
2391
有多少人知道這個符號?
04:16
Yeah, I see some shakes奶昔.
98
244343
1352
我看到有人點頭了。
04:17
Have you ever tried試著 to copy複製
and paste data數據 out of a PDFPDF
99
245695
2655
你們有沒有試過從PDF文檔中
拷貝和黏貼數據,
04:20
and make sense of it?
100
248350
1357
並據此作出結論呢?
04:21
I see more shakes奶昔.
101
249707
1060
我看到更多人點頭了。
試圖拷貝粘貼的人
比認識這個標誌的人更多,真有趣。
04:22
More of you tried試著 copying仿形 and pasting粘貼
than knew知道 the logo商標. I like that.
102
250767
3345
你們剛剛看到的數據是做在PDF裡的。
04:26
So what happened發生 is, the data數據
that you just saw was actually其實 on a PDFPDF.
103
254112
3510
04:29
In fact事實, hundreds數以百計 and hundreds數以百計
and hundreds數以百計 of pages網頁 of PDFPDF
104
257622
3105
事實上,是成千上萬頁的PDF文檔,
04:32
put out by our very own擁有 NYPDNYPD,
105
260727
2159
由我們的紐約警署發佈。
04:34
and in order訂購 to access訪問 it,
you would either have to copy複製 and paste
106
262886
3152
如果你想享用這些數據,
你要不就持續
做複製黏貼的動作,花掉成千上萬小時,
04:38
for hundreds數以百計 and hundreds數以百計 of hours小時,
107
266038
1726
要不就像約翰▪克勞斯一樣。
04:39
or you could be John約翰 Krauss克勞斯.
108
267764
1344
04:41
John約翰 Krauss克勞斯 was like,
109
269108
1043
約翰▪克勞斯
04:42
I'm not going to copy複製 and paste this data數據.
I'm going to write a program程序.
110
270151
3413
可不想重複地去複製黏貼,
他寫了一個程式。
這個程序叫做
「紐約警署交通事故數據OK蹦」,
04:45
It's called the NYPDNYPD Crash緊急 Data數據 Band-Aid創可貼,
111
273564
2288
04:47
and it goes to the NYPD'sNYPD的 website網站
and it would download下載 PDFsPDF文件.
112
275852
3032
它能到紐約警署的網站下載PDF文檔,
04:50
Every一切 day it would search搜索;
if it found發現 a PDFPDF, it would download下載 it
113
278884
3126
每天它都去搜索;
如果找到一個PDF文檔,就下載下來,
04:54
and then it would run
some PDF-scrapingPDF刮 program程序,
114
282010
2250
然後運行某個PDF解碼的程式,
04:56
and out would come the text文本,
115
284260
1336
把其中的文字信息提取出來,
04:57
and it would go on the Internet互聯網,
and then people could make maps地圖 like that.
116
285596
3565
其中的訊息會發佈在網路上,
人們就可以製作這些地圖。
05:01
And the fact事實 that the data's數據的 here,
the fact事實 that we have access訪問 to it --
117
289161
3429
這些數據就在那兒,我們都能得到——
05:04
Every一切 accident事故, by the way,
is a row in this table.
118
292590
2450
每一個交通事故就是一行數據。
05:07
You can imagine想像 how many許多 PDFsPDF文件 that is.
119
295040
1836
你們可以想像有多少PDF需要轉碼。
05:08
The fact事實 that we
have access訪問 to that is great,
120
296876
2207
——我們能看到這些數據固然好,
05:11
but let's not release發布 it in PDFPDF form形成,
121
299083
2110
但能不能不要弄成PDF格式的,
05:13
because then we're having our citizens公民
write PDFPDF scrapers鏟運機.
122
301193
2739
不然市民們就得去寫PDF解碼的程式,
05:15
It's not the best最好 use
of our citizens'公民 time,
123
303932
2076
這對市民的時間來說是一種浪費,
而我們的城市能做的更好。
05:18
and we as a city can do better than that.
124
306008
2004
05:20
Now, the good news新聞 is that
the de BlasioBlasio administration行政
125
308012
2736
有個好消息,白思豪市長的班底
05:22
actually其實 recently最近 released發布 this data數據
a few少數 months個月 ago,
126
310748
2532
在幾個月前公開了這份數據,
05:25
and so now we can
actually其實 have access訪問 to it,
127
313280
2158
所以我們能直接享用這些數據,
05:27
but there's a lot of data數據
still entombed埋葬 in PDFPDF.
128
315438
2536
然而還有很多數據是PDF格式的。
05:29
For example, our crime犯罪 data數據
is still only available可得到 in PDFPDF.
129
317974
3197
譬如,我們的罪案數據目前只有PDF格式的。
05:33
And not just our crime犯罪 data數據,
our own擁有 city budget預算.
130
321171
3755
除了罪案數據,市政預算也是如此。
05:36
Our city budget預算 is only readable可讀
right now in PDFPDF form形成.
131
324926
3729
目前我們的市政預算只有PDF格式的。
05:40
And it's not just us
that can't analyze分析 it --
132
328655
2141
不僅是我們無法分析這些數字,
05:42
our own擁有 legislators立法者
who vote投票 for the budget預算
133
330796
2955
那些為市政預算投票的立法委員們
05:45
also only get it in PDFPDF.
134
333751
1943
也只能拿到PDF版本的數字。
05:47
So our legislators立法者 cannot不能 analyze分析
the budget預算 that they are voting表決 for.
135
335694
3844
所以我們的立法委員是無法分析
他們要為之投票的市政預算的。
05:51
And I think as a city we can do
a little better than that as well.
136
339538
3608
我認為我們的城市還能做得更好。
05:55
Now, there's a lot of data數據
that's not hidden in PDFsPDF文件.
137
343146
2488
很多數據已經不躲在PDF中了。
05:57
This is an example of a map地圖 I made製作,
138
345634
1700
這裡有一幅地圖可以作為例證,
05:59
and this is the dirtiest最臟 waterways水路
in New York紐約 City.
139
347334
2926
標示了紐約市最骯髒的水路。
06:02
Now, how do I measure測量 dirty?
140
350260
1509
我是如何衡量「骯髒」的呢?
06:03
Well, it's kind of a little weird奇怪的,
141
351769
1857
這裡有些奇怪,
06:05
but I looked看著 at the level水平
of fecal糞便 coliform大腸菌群,
142
353626
2113
我衡量的是糞便大腸菌群的水平,
06:07
which哪一個 is a measurement測量 of fecal糞便 matter
in each of our waterways水路.
143
355739
3506
這是水路中糞便物質的一種衡量指標。
06:11
The larger the circle,
the dirtier the water,
144
359245
3274
圓圈越大,水就越髒,
06:14
so the large circles are dirty water,
the small circles are cleaner清潔器.
145
362519
3357
所以圖上的大圓圈代表髒水,
小圓圈代表乾淨的水。
06:17
What you see is inland內陸 waterways水路.
146
365876
1644
大家看到的是內河水道。
06:19
This is all data數據 that was sampled取樣
by the city over the last five years年份.
147
367520
3404
這裡有紐約市過去五年採樣的所有數據。
06:22
And inland內陸 waterways水路 are,
in general一般, dirtier.
148
370924
2694
內河水道總的來說變髒了。
06:25
That makes品牌 sense, right?
149
373618
1218
這個結論挺合理的,對嗎?
06:26
And the bigger circles are dirty.
And I learned學到了 a few少數 things from this.
150
374836
3374
大圓圈代表髒水。
我從中學到了幾件事情。
06:30
Number one: Never swim游泳 in anything
that ends結束 in "creek" or "canal運河."
151
378210
3164
第一:千萬別在任何叫做「xx溪」
或「xx運河」的地方游泳。
06:33
But number two: I also found發現
the dirtiest最臟 waterway水路 in New York紐約 City,
152
381374
4318
但是第二:紐約市最髒的水路,
06:37
by this measure測量, one measure測量.
153
385692
1834
只看(糞便大腸菌群)這個唯一的指標,
06:39
In Coney科尼 Island Creek, which哪一個 is not
the Coney科尼 Island you swim游泳 in, luckily.
154
387526
3648
在康尼島溪,幸好不是你們游泳的康尼島。
06:43
It's on the other side.
155
391174
1158
那在島的另一面。
06:44
But Coney科尼 Island Creek, 94 percent百分
of samples樣本 taken採取 over the last five years年份
156
392332
3878
但在康尼島溪中,
過去五年的採樣中有94%
06:48
have had fecal糞便 levels水平 so high
157
396210
2157
含有超標的糞便含量,
06:50
that it would be against反對 state law
to swim游泳 in the water.
158
398367
3093
以至於達到州法律禁止游泳的水平。
06:53
And this is not the kind of fact事實
that you're going to see
159
401460
2729
這種類型的事實
你可不會在市政報告中看到,不是嗎?
06:56
boasted吹噓 in a city report報告, right?
160
404189
1537
06:57
It's not going to be
the front面前 page on nyc紐約.govGOV.
161
405726
2250
這也不會登上紐約市政府網站的頭條。
06:59
You're not going to see it there,
162
407976
1580
我們肯定不會看到的,
但能看到這些數據真實不錯。
07:01
but the fact事實 that we can get
to that data數據 is awesome真棒.
163
409556
2518
07:04
But once一旦 again, it wasn't super easy簡單,
164
412074
1773
同樣,拿到這些數據並不容易,
07:05
because this data數據 was not
on the open打開 data數據 portal門戶.
165
413847
2358
因為它們並不在公開數據門戶網站上。
07:08
If you were to go to the open打開 data數據 portal門戶,
166
416205
2013
如果你看公開數據的門戶網站,
07:10
you'd see just a snippet片段 of it,
a year or a few少數 months個月.
167
418218
2613
你只能看到其中一些片段,
只有一年內或幾個月的數據。
07:12
It was actually其實 on the Department
of Environmental環境的 Protection's保護的 website網站.
168
420831
3390
這些數據其實是在環境保護部門的網站上。
07:16
And each one of these links鏈接 is an Excel高強
sheet, and each Excel高強 sheet is different不同.
169
424221
3878
每一個鏈接都是一個Excel文件,
而每個Excel文件都是不一樣的。
07:20
Every一切 heading標題 is different不同:
you copy複製, paste, reorganize改組.
170
428099
2630
每一個表頭都不同:
需要複製、黏貼、還有重新整理。
07:22
When you do you can make maps地圖
and that's great, but once一旦 again,
171
430729
2952
一旦完成你就能做出這些地圖,
但我要再次重申,
我們的城市能做的更好,
我們可以標準化。
07:25
we can do better than that
as a city, we can normalize正常化 things.
172
433681
2969
我們正在改善這裡有個
索克拉塔公司建立的網站
07:28
And we're getting得到 there, because
there's this website網站 that SocrataSocrata makes品牌
173
436650
3384
叫做「紐約市公開數據門戶」。
07:32
called the Open打開 Data數據 Portal門戶 NYCNYC.
174
440034
1541
這裡,1100個數據庫
都不存在標準化的問題,
07:33
This is where 1,100 data數據 sets
that don't suffer遭受
175
441575
2257
07:35
from the things I just told you live生活,
176
443832
1781
而且(這些無縫連接的數據庫)數字還在增加。
你可以下載任一格式的數據:
CSV、PDF或Excel文件都可以。
07:37
and that number is growing生長,
and that's great.
177
445613
2148
07:39
You can download下載 data數據 in any format格式,
be it CSVCSV or PDFPDF or Excel高強 document文件.
178
447761
3412
按你自己的需求來下載。
07:43
Whatever隨你 you want,
you can download下載 the data數據 that way.
179
451173
2547
但問題又來了,
07:45
The problem問題 is, once一旦 you do,
180
453720
1352
你會發現不同的機構
用不同的代碼來表示地址。
07:47
you will find that each agency機構
codes代碼 their addresses地址 differently不同.
181
455072
3686
07:50
So one is street name名稱,
intersection路口 street,
182
458758
2141
有街道名、有路口名、
07:52
street, borough, address地址, building建造,
building建造 address地址.
183
460899
2491
行政區、地址、建築物、建築物地址等等。
07:55
So once一旦 again, you're spending開支 time,
even when we have this portal門戶,
184
463390
3180
所以,即使有這個門戶網站的幫助,
07:58
you're spending開支 time
normalizing正火 our address地址 fields領域.
185
466570
2606
你還得花時間來標準化地址這塊的數據。
08:01
And that's not the best最好 use
of our citizens'公民 time.
186
469176
2423
這也不是有效利用市民時間的方法。
08:03
We can do better than that as a city.
187
471599
1796
我們的城市能做得更好。
08:05
We can standardize規範 our addresses地址,
188
473395
1645
我們可以對地址進行標準化,
08:07
and if we do,
we can get more maps地圖 like this.
189
475040
2185
如果做到了,
我們就能做出更多這樣的地圖。
08:09
This is a map地圖 of fire hydrants消防栓
in New York紐約 City,
190
477225
2285
這是紐約市消防龍頭的地圖,
08:11
but not just any fire hydrants消防栓.
191
479510
1531
但不僅於此。
08:13
These are the top最佳 250 grossing票房 fire
hydrants消防栓 in terms條款 of parking停車處 tickets門票.
192
481041
4726
這些是前250個吃到最多違章停車罰單的
消防栓位置圖。
08:17
(Laughter笑聲)
193
485767
1986
(笑聲)
08:19
So I learned學到了 a few少數 things from this map地圖,
and I really like this map地圖.
194
487753
3358
我從圖中學到了幾件事,
我也真的喜歡這張圖。
08:23
Number one, just don't park公園
on the Upper East Side.
195
491111
2402
第一:別在上東區停車。
08:25
Just don't. It doesn't matter where
you park公園, you will get a hydrant消防栓 ticket.
196
493513
3587
千萬別停。因為不管停哪裡都會吃罰單。
08:29
Number two, I found發現 the two highest最高
grossing票房 hydrants消防栓 in all of New York紐約 City,
197
497100
4153
第二:我找出了全紐約市最最容易
吃到違章停車罰單的兩個消防栓的位置,
08:33
and they're on the Lower降低 East Side,
198
501253
1886
兩個都在下東區,
08:35
and they were bringing使 in over
55,000 dollars美元 a year in parking停車處 tickets門票.
199
503139
5098
每年能在罰單上創收五萬五千多美金。
08:40
And that seemed似乎 a little strange奇怪
to me when I noticed注意到 it,
200
508237
2738
我注意到這點,覺得有些奇怪,
08:42
so I did a little digging挖掘 and it turns out
what you had is a hydrant消防栓
201
510975
3269
於是深入挖掘了一下原因,
結果發現消防栓
08:46
and then something called
a curb抑制 extension延期,
202
514244
1996
都有一個叫做控制擴展的區域,
08:48
which哪一個 is like a seven-foot七尺
space空間 to walk步行 on,
203
516240
2059
是約有七英呎的一塊地方,可以走路,
08:50
and then a parking停車處 spot.
204
518299
1156
然後是一個停車位。
08:51
And so these cars汽車 came來了 along沿,
and the hydrant消防栓 --
205
519455
2254
所以車開過來,司機發現消防栓,
08:53
"It's all the way over there, I'm fine,"
206
521709
1911
想“還有一段距離,這裡沒問題的”,
08:55
and there was actually其實 a parking停車處 spot
painted there beautifully精美 for them.
207
523620
3474
何況地上還有一個畫得美美的停車位,
司機停好車,但紐約警署不同意這種配置,
08:59
They would park公園 there, and the NYPDNYPD
disagreed不同意 with this designation指定
208
527094
3155
開出了罰單。
09:02
and would ticket them.
209
530249
1058
可不只是我本人吃了罰單,
09:03
And it wasn't just me
who found發現 a parking停車處 ticket.
210
531307
2344
這是谷歌街景拍到的一輛過路車,
09:05
This is the Google谷歌
Street View視圖 car汽車 driving主動 by
211
533651
2146
也吃了同樣的一張罰單。
09:07
finding發現 the same相同 parking停車處 ticket.
212
535797
1617
09:09
So I wrote about this on my blog博客,
on I Quant定量 NY紐約, and the DOT responded回應,
213
537414
4504
於是我把這件事發到自己的部落格上
以及“I Quant NY”上,
結果交通部門回復如下:
09:13
and they said,
214
541918
1020
“交通部並未就此地點收到相關投訴,
09:14
"While the DOT has not received收到
any complaints投訴 about this location位置,
215
542938
3410
我們會重新檢視道路標誌,
並做出適當的改善措施。”
09:18
we will review評論 the roadway巷道 markings標記
and make any appropriate適當 alterations改變."
216
546348
4542
09:22
And I thought to myself,
typical典型 government政府 response響應,
217
550890
2959
我暗自想:多麼官腔!
09:25
all right, moved移動 on with my life.
218
553849
1881
好吧,我該幹嘛幹嘛去了。
09:27
But then, a few少數 weeks later後來,
something incredible難以置信 happened發生.
219
555730
3970
然而,幾週時間過去,
發生了意料之外的事情。
09:31
They repainted粉刷一新 the spot,
220
559700
2520
停車位重新畫了,
09:34
and for a second第二 I thought I saw
the future未來 of open打開 data數據,
221
562220
2690
那一瞬間我覺得能看到公開數據的未來。
09:36
because think about what happened發生 here.
222
564910
2000
大家想想這件事,
09:38
For five years年份, this spot was being存在
ticketed票款, and it was confusing撲朔迷離,
223
566910
5100
過去五年,這個讓人困惑的停車位
一直讓人吃罰單,
09:44
and then a citizen公民 found發現 something,
they told the city, and within a few少數 weeks
224
572010
4306
但某一天,一位市民發現了問題
報告市政機關,又過了幾週時間,
09:48
the problem問題 was fixed固定.
225
576316
1294
問題車位被修正了。
09:49
It's amazing驚人. And a lot of people
see open打開 data數據 as being存在 a watchdog看家狗.
226
577610
3200
太不可思議了。很多人認為
公開數據讓市民變成政府的監視者,
09:52
It's not, it's about being存在 a partner夥伴.
227
580810
1772
並非如此,它實則讓人們成為了合作夥伴。
09:54
We can empower授權 our citizens公民
to be better partners夥伴 for government政府,
228
582582
3138
市民能夠有底氣成為政府更好的合作夥伴,
09:57
and it's not that hard.
229
585720
1881
這並不難。
09:59
All we need are a few少數 changes變化.
230
587601
1459
我們只需要作出一些改變。
10:01
If you're FOILing挫敗 data數據,
231
589060
1107
如果我們在申請FOIL信息自由法案數據,
10:02
if you're seeing眼看 your data數據
being存在 FOILed挫敗 over and over again,
232
590167
2867
如果你看到自己申請的數據已經被反覆申請,
10:05
let's release發布 it to the public上市, that's
a sign標誌 that it should be made製作 public上市.
233
593034
3574
讓我們直接向公眾公開,
因為反覆申請就是需要公開的一种信號。
10:08
And if you're a government政府 agency機構
releasing釋放 a PDFPDF,
234
596608
2482
如果某個政府機關正在發佈PDF數據,
10:11
let's pass通過 legislation立法 that requires要求 you
to post崗位 it with the underlying底層 data數據,
235
599090
3649
讓我們通過法案
要求他們發佈隱藏的數據,
10:14
because that data數據
is coming未來 from somewhere某處.
236
602739
2028
因為這些數據必定有來源。
10:16
I don't know where, but it's
coming未來 from somewhere某處,
237
604767
2482
我不知道從哪兒,但肯定有來源,
10:19
and you can release發布 it with the PDFPDF.
238
607249
1725
可以發佈PDF之外的信息。
10:20
And let's adopt採用 and share分享
some open打開 data數據 standards標準.
239
608974
2411
讓我們運用并分享一些公開數據的標準。
10:23
Let's start開始 with our addresses地址
here in New York紐約 City.
240
611385
2481
讓我們從紐約本市的地址開始,
10:25
Let's just start開始
normalizing正火 our addresses地址.
241
613866
2074
把地址標準化。
10:27
Because New York紐約 is a leader領導 in open打開 data數據.
242
615940
2062
因為紐約是公開數據的領導者。
10:30
Despite儘管 all this, we are absolutely絕對
a leader領導 in open打開 data數據,
243
618002
2789
儘管如此,我們絕對是公開數據的領導者,
10:32
and if we start開始 normalizing正火 things,
and set an open打開 data數據 standard標準,
244
620791
3121
如果我們開始做標準化的工作,
建立公開數據的標準,
其他人都會追隨的。州里會、聯邦政府也可能,
10:35
others其他 will follow跟隨. The state will follow跟隨,
and maybe the federal聯邦 government政府,
245
623912
3634
我知道這或許聽上去有些瘋狂,
但別的國家也未嘗不會追隨。
10:39
Other countries國家 could follow跟隨,
246
627546
1445
我們不久後也許能開發出
10:40
and we're not that far off from a time
where you could write one program程序
247
628991
3411
可以涵蓋100個國家地圖信息的程式。
10:44
and map地圖 information信息 from 100 countries國家.
248
632402
1890
這可不是科幻小說,
而是指日可待的事實。
10:46
It's not science科學 fiction小說.
We're actually其實 quite相當 close.
249
634292
2487
這能幫助誰?
10:48
And by the way, who are we
empowering授權 with this?
250
636779
2240
可不單單是約翰▪克勞斯和克里斯▪旺。
10:51
Because it's not just John約翰 Krauss克勞斯
and it's not just Chris克里斯 WhongWhong.
251
639019
3005
紐約城現在正有幾百個聚會在進行,
10:54
There are hundreds數以百計 of meetups聚會
going on in New York紐約 City right now,
252
642024
3095
10:57
active活性 meetups聚會.
253
645119
1025
都是活躍的聚會。
這些聚會讓幾千人參與其中。
10:58
There are thousands數千 of people
attending出席 these meetups聚會.
254
646144
2572
他們下班後或在週末會面,
11:00
These people are going after work
and on weekends週末,
255
648716
2368
共同研究空開數據,
11:03
and they're attending出席 these meetups聚會
to look at open打開 data數據
256
651084
2636
幫助我們的城市變得更好,
11:05
and make our city a better place地點.
257
653720
1640
BetaNYC這樣的團體,上週剛剛發佈了
citygram.nyc
11:07
Groups like BetaNYCBetaNYC, who just last week
released發布 something called citygramcitygram.nyc紐約
258
655360
4073
11:11
that allows允許 you to subscribe訂閱
to 311 complaints投訴
259
659433
2147
讓你能夠訂閱311個
11:13
around your own擁有 home,
or around your office辦公室.
260
661580
2068
自己住家或辦公地周圍的投訴。
11:15
You put in your address地址,
you get local本地 complaints投訴.
261
663648
2427
你輸入地址,就能看到附近的投訴。
11:18
And it's not just the tech高科技 community社區
that are after these things.
262
666075
3374
而且,做這些事情的並不限於技術社團。
11:21
It's urban城市的 planners規劃者 like
the students學生們 I teach at Pratt普拉特.
263
669449
2622
我在Pratt學院教的城市規劃學生
也在做同樣的事。
11:24
It's policy政策 advocates倡導者, it's everyone大家,
264
672071
1919
還有政策提倡者、以至每個人,
11:25
it's citizens公民 from a diverse多種
set of backgrounds背景.
265
673990
2563
是擁有不同領域背景的市民們。
11:28
And with some small, incremental增加的 changes變化,
266
676553
2786
隨著一個個小的改變,
11:31
we can unlock開鎖 the passion
and the ability能力 of our citizens公民
267
679339
3225
我們能解開市民們激情和能力的封印,
11:34
to harness馬俱 open打開 data數據
and make our city even better,
268
682564
3156
好好利用空開數據,建設更好的城市,
11:37
whether是否 it's one dataset數據集,
or one parking停車處 spot at a time.
269
685720
3626
就算每次只有一個數據庫,或只是一個停車位。
11:41
Thank you.
270
689346
2322
謝謝。
11:43
(Applause掌聲)
271
691668
3305
(掌聲)
Translated by Karen SONG
Reviewed by Muyun Zhou

▲Back to top

ABOUT THE SPEAKER
Ben Wellington - Data scientist
Ben Wellington blends his love of statistics, the city, and comedy in his entertaining analysis of the story of New York City, told through data.

Why you should listen

Ben Wellington runs the I Quant NY blog, in which he crunches city-released data to find out what's really going on in the Big Apple. To date he has tackled topics such as measles outbreaks in New York City schools, analyzed how companies like Airbnb are really doing in NYC, and asked questions such as "does gentrification cause a reduction in laundromats?" (Answer: inconclusive.)

Ben is a visiting assistant professor in the City & Regional Planning program at the Pratt Institute in Brooklyn; his day job involves working as a quantitative analyst at the investment management firm, Two Sigma. A budding comedian and performer, he also teaches team building workshops through Cherub Improv, a non-profit that uses improv comedy for social good.

More profile about the speaker
Ben Wellington | Speaker | TED.com