Montreal R User Group: ggplot2 & rpivotTable

I recently gave a talk at the Montreal R User Group about my favourite data visualization library, ggplot2, as well as rpivotTable, the R interface to my own PivotTable.js

As you can see in the video above, during the talk I just scrolled through an R file in RStudio. What you see below is the result of slightly modifying that file and running it through the RMarkdown process to capture the output.

Agenda

  • About me
  • About you
  • ggplot2
  • rpivotTable

About me

  • I work at Datacratic: a machine learning startup based here in Montreal
  • We built the Machine Learning Database, check it out at http://mldb.ai
  • I wrote PivotTable.js which is a Javascript pivot table which you’ll see in a bit

About you

  • Do you use R every week? CLI or Rstudio?
  • Have you ever used ggplot?
  • Kept using it, never used it again, sometimes?
  • Have you ever used the Pivot Table in Excel? Some other tool?

ggplot2

  • I’m a dataviz nerd: I try every new library/technique
  • I usually work in Python, but I always come back to R
  • why? ggplot2, it’s the goldilocks dataviz tool!
  • higher-level than a drawing tool
  • lower-level than a charting tool

does nothing

library(ggplot2)
ggplot(mtcars) 

bar chart

ggplot(mtcars) +
  geom_bar( aes(x=factor(cyl)) )

stacked bar chart

ggplot(mtcars) +
  geom_bar( aes(x=factor(cyl), fill=factor(gear)) )

grouped bar chart

ggplot(mtcars) +
  geom_bar( aes(x=factor(cyl), fill=factor(gear)), position="dodge" )

coloured bar chart

ggplot(mtcars) +
  geom_bar( aes(x=factor(cyl), fill=factor(cyl)) )

one big stack, full width…

ggplot(mtcars) +
  geom_bar( aes(x=factor(1), fill=factor(cyl)), width=1 )

…in polar coordinates -> pie chart!

ggplot(mtcars) +
  geom_bar( aes(x=factor(1), fill=factor(cyl)), width=1) + 
  coord_polar(theta="y")

scatterplot

ggplot(mtcars) + 
  geom_point( aes(x=mpg, y=disp) )

scatterplot with LOESS

ggplot(mtcars) + 
  geom_point( aes(x=mpg, y=disp) ) + 
  geom_smooth( aes(x=mpg, y=disp) )
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

less repetition

ggplot(mtcars, aes(x=mpg, y=disp)) + 
  geom_point() + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

colours

ggplot(mtcars, aes(x=mpg, y=disp, color=factor(cyl))) + 
  geom_point() + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

one big LOESS across colours

ggplot(mtcars, aes(x=mpg, y=disp)) + 
  geom_point( aes(color=factor(cyl)) ) + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

making the points bigger

ggplot(mtcars, aes(x=mpg, y=disp)) + 
  geom_point( aes(color=factor(cyl)), size=4) + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

faceting the plot across gears & transmissions

ggplot(mtcars) + 
  geom_point( aes(x=mpg, y=disp, color=factor(cyl)), size=4) +
  facet_grid(gear~am)

same plot as a one-liner with qplot

qplot(data=mtcars, geom="point", x=mpg, y=disp, color=factor(cyl), size=I(4), facets=gear~am)

decorating our one-liner

qplot(data=mtcars, x=mpg, y=disp, color=factor(cyl), size=I(4), facets=gear~am) +
  scale_colour_manual(values = c("red","blue", "green")) +
  labs(title="Displacement vs MPG by Transmission Type & Cylinders\n\n") +
  theme(legend.position="bottom")

See also: http://docs.ggplot2.org/

a real-world example: the 2012 Canadian Parliament

From /content/2015/03/mp_dotplot/

mps = read.csv("mps.csv")
head(mps)
##               Name          Party         Province Age Gender
## 1      Liu, Laurin            NDP           Quebec  22 Female
## 2   Mourani, Maria Bloc Quebecois           Quebec  43 Female
## 3 Sellah, Djaouida            NDP           Quebec  NA Female
## 4   St-Denis, Lise            NDP           Quebec  72 Female
## 5        Fry, Hedy        Liberal British Columbia  71 Female
## 6   Turmel, Nycole            NDP           Quebec  70 Female
mps$Province = factor(mps$Province, levels=c("British Columbia", "Alberta", "Saskatchewan",
                                             "Manitoba", "Territories", "Ontario", "Quebec", "New Brunswick", "Prince Edward Island", 
                                             "Nova Scotia", "Newfoundland and Labrador"))
levels(mps$Province) = c("BC", "AB", "SK", "MB", "YK/NT/NU", "ON", "QC", "NB", "PE", "NS", "NL")

mps$Party = factor(mps$Party, levels=c("Conservative", "NDP", "Liberal", "Bloc Quebecois", "Green"))
levels(mps$Party) = c("CON", "NDP", "LIB", "BLQ", "GRN")

ggplot(subset(mps, !is.na(Age)), aes(x=Age, 
                    fill=factor(Gender), color=factor(Gender))) +
  geom_dotplot(stackgroups = TRUE, binwidth=5, method="histodot") +
  facet_grid(Party~Province, space="free_y", scales="free_y") +
  scale_y_continuous(breaks = NULL) +
  theme(legend.position="bottom", legend.title=element_blank())  + 
  labs(y="", x="Age in bins of 5 years", color="", fill="", title = expression(atop(
    "Canadian Members of Parliament in 2012 by Province, Party, Age & Gender", 
    atop(italic("each dot represents one MP; 3 are missing due to unknown birthdates"), ""))))

rpivotTable

My PivotTable.js library: /pivottable/examples/

rpivotTable: https://github.com/smartinsightsfromdata/rpivotTable

library(rpivotTable)
rpivotTable(mtcars)
read.csv("mps.csv")
rpivotTable(mps)

More Talks


© Nicolas Kruchten 2010-2024