import pandas, math
from sklearn.cluster import MiniBatchKMeans
from folium import Map, CircleMarker, Vega, Popup
from vincent import Bar
This script by Nicolas Kruchten was used to generate a map for a CBC.ca article showing the spatio-temporal distribution of available Car2Go vehicles in Montreal over a 30-day period during the summer of 2016. more details »
pandas
and
scikit-learn
are used to manipulate the
data, folium
to make the
map, and vincent
to make the
popup charts.
import pandas, math
from sklearn.cluster import MiniBatchKMeans
from folium import Map, CircleMarker, Vega, Popup
from vincent import Bar
The dataset contains approximately 3.2 million observations, consisting of
(lat, lon, hod)
tuples, where hod
stands for ‘hour of day’.
df = pandas.read_csv("lat_lon_hod.csv.gz")
carh
stands for ‘car-hours’, and since the dataset was made by querying
the Car2Go API every 5 minutes, each observation means that on average,
a car was present for one twelfth of an hour during the given hour of day at
the given latitude and longitude.
samples_per_hour = 12
df["carh"] = 1.0/samples_per_hour
Using K-means clustering, the observations are grouped into 250 spatially-contiguous zones such that each zone contains a comparable number of observations.
df["zone"] = MiniBatchKMeans(250).fit_predict(df[["lat", "lon"]].values)
A map is initialized, centered on the centroid of all the observations.
map = Map(zoom_start=12, tiles="cartodbpositron",
location=list(df[["lat","lon"]].mean().values))
For each zone…
zones = df.groupby("zone").agg(dict(carh='sum', lon='mean', lat='mean'))
for i, zone in zones.iterrows():
…a dataset is created containing the average number of car-hours/day by hour of day. (1 car-hour/day at a given hour of day in a given zone represents either 1 car available in the zone for the entire hour, or 3 cars for 20 minutes each, or 1 car for 20 minutes and 1 for 40 minutes, and so on.)
num_days = 30
carh_by_hod = df.query("zone==@i").groupby("hod").carh.sum()/num_days
…a bar chart is created with this dataset, which shows the daily pattern of car availabilities. This is done by first generating a JSON description of a bar-chart in the Vega grammar, and then rendering it to a graphic.
vega = Bar(carh_by_hod, width=450,
height=150).axis_titles(x='Hour of Day', y='Cars Available')
chart = Vega(vega.to_json(), width=vega.width+50, height=vega.height+50)
… and a circle is added to the map:
map.add_child( CircleMarker(
• the circle is placed on the centroid of the zone’s observations
location = [zone["lat"], zone["lon"]],
• the circle area is proportional to the total number of car-hours observed in the zone
radius = int(6*math.sqrt(zone["carh"])),
• the circle is coloured according to the hour of day with peak car availability: blues representing night-time, yellows representing day-time and reds representing evenings
fill_opacity = 0.8, color=None,
fill_color = ('#274cc9','#274cc9','#274cc9','#274cc9',
'#274cc9','#3959bf','#647aa6','#909b8c','#bcbc73',
'#e8dd5a','#f1e455','#f1e455','#f1e455','#f1e455',
'#f0df56','#ecc45a','#e7a95f','#e28e63','#de7467',
'#c46576','#9d5f8a','#76599f','#4e52b4','#274cc9'
)[carh_by_hod.idxmax()],
• clicking on the circle pops up the daily-pattern chart
popup = Popup(max_width=chart.width[0]).add_child(chart)
) )
The map is then saved to disk as an HTML file with inlined data, Javascript and CSS information. This is very convenient for collaborating with colleagues without sending a large number of files or requiring them to run this script.
map.save('map.html')