is a software developer
in Montréal, Québec, Canada.
If you’ve ever been browsing the web and been annoyed by those One Weird Trick ads, or by ads for that product you looked at online last month and then bought offline, you’ve probably given a thought to blocking ads altogether. The response to this idea, from people who run websites for a living, ranges from “it’s unethical” to “it’s stealing!”. According to them, the reason you get to use a website without paying for it yourself is that in exchange you see ads and website owners gets paid by the advertisers. That’s a polite summary of the great Ad-Blocking Debate, which has been going on since the early days of the commercial web. I’m not going to take sides here; rather I’ll propose a compromise enabled by a recent development in online advertising technology. I’m going to describe a “weird trick,” if you will: how to use the same system as those ads that follow you around to block ads, all the while ensuring that the websites you frequent have nothing to complain about.
Recently I made some maps of the 2013 Montreal municipal elections, showing voting results down to the ballot-box level, using data from the Montreal Open Data Portal. It turns out, however, that not all of the ballot boxes in that data set are associated with a small geographical area like the ones shown in my by-ballot-box map, and furthermore, those ballot boxes have very different numbering schemes than the ones that do match up with small block-sized areas, numbers like 901 and 601 and 001A instead of small numbers from 1 to 100ish, like the others.
So what gives? These results appear to be from the early-voting polls, which, given that there are fewer of them, cover a larger area per ballot box. In this post I take a look at how leaving this data out of my maps skews the results I present.
The Montreal municipal elections were just over two months ago but I played with the election results dataset over the holidays anyways as an excuse to play with a type of data I don’t normally have much to do with: geographical data. Without further ado, here is the map I made, and this post explains a bit about the process.
In the Montreal mayoral election last November, nearly 85% of the vote went to one of the top three candidates. A pie chart is a simple way to show the breakdown of votes between candidates for the whole election, say, but what if you wanted to look at the vote breakdown for each of the 52 electoral districts? 52 pie charts is kind of hard to look at and discern any sort of pattern. It turns out that if you only want to look at the top three candidates, you can use a ternary plot to good effect, like I did in the image above. There’s an interactive version as well which helps make the link between the ternary plot and the map via mouse-overs.
I was inspired by some cool "dot map" visualization projects around the internet (North American Census Dotmap, Toronto Visible Minorities Dot Map) to create a similar visualization of the results for the recent Montreal municipal election. I leveraged data from the Montreal Open Data portal to create the map above. There are coloured dots for (almost) each vote for the mayoralty for the top three candidates, randomly located within the catchment area for the polling booth it came from. What I like about this map is that it shows the results in all their messiness rather than neatly colour-coding entire neighbourhoods like a choropleth map would. People live and vote in arbitrary-looking clusters, not in neat blocks!
There was a municipal election here in Montreal on November 3, and I had the opportunity to help build an election results dashboard to be projected on the big screen at the election-night party for the political party I support: Projet Montréal. The dashboard is still up with final results. I worked with Nicolas Marchildon, who had put together a similar system for the 2009 election.
In Part 1 of this series, I said that in real-time bidding, we should “bid truthfully”, i.e. that you should bid whatever it is worth to you to win. To compute this truthful value, given a target cost per action (CPA) for a campaign, I said you could just multiply that target by the computed probability of seeing an action after the impression, and that would give you your bid value.
I added that by calculating an expected cost of winning an auction, you could compute the expected surplus for that auction and that to pace your spending efficiently, you would only bid truthfully when this expected surplus was above some threshold value, and not bid otherwise. This threshold value would be the output of a closed-loop pace control system (described in Part 0) whose job it is to keep the spend rate close to some target.
In Part 3 of this series, I then showed that in fact, the second claim of Part 1 was not optimal and that instead of setting an expected surplus threshold, you should set an expected return-on-investment (ROI) threshold.
In this post, Part 4 of the series, I show that the meaning of “bidding truthfully” can be slipperier than expected, and that you can get the same results as an ROI-based pacing strategy with a perfect expected-cost model, without even needing to use an expected-cost model.
In Part 1 of this Peeking Into the Black Box series, I described how you could compute the expected economic surplus of truthfully bidding on an impression in an RTB context. I then explained that you could use this computation to decide which bidding opportunities were “better” than others and therefore decide when to bid and and when not to bid, based on the output of a closed-loop pace control system such as the one described in Part 0.
In this post, I show that in order to maximize the economic surplus over a whole campaign, the quantity you should use on an auction-by-auction basis to decide when to bid is actually the expected return on investment (ROI) rather than the expected surplus. At Datacratic, we actually switched to an ROI-based strategy in late 2012.
If you work with data and you don't know what a pivot table is, I encourage you to learn about them, because they are very useful for quick'n'dirty data analysis. My web-based implementation is a decent learning tool but there are other, much-better implementations, such as in Microsoft Excel (although since Office 2003 they've made some changes that were not for the better) and AquaDataStudio.
I posted this on Hacker News and got some nice comments!