Friday, September 13, 2013

How to expand color palette with ggplot and RColorBrewer

Histograms are almost always a part of data analysis presentation. If it is made with R ggplot package then it may look like this:

data(mtcars)
 
ggplot(mtcars) +
  geom_histogram(aes(factor(cyl), fill=factor(cyl)))


The elegance of ggplot functions is in simple yet compact expression of visualization formula while hiding many options assumed by default. Hiding doesn't mean lacking as most options are just a step away. For example, color selection can change with one of scale functions such as scale_fill_brewer:

ggplot(mtcars) +
  geom_histogram(aes(factor(cyl), fill=factor(cyl))) +
  scale_fill_brewer()


In turn, scale_fill_brewer palette can be changed too:

ggplot(mtcars) +
  geom_histogram(aes(factor(cyl), fill=factor(cyl))) +
  scale_fill_brewer(palette="Set1")



Palettes used live in the package RColorBrewer - to see all available choices simply attach the package with library(RColorBrewer) and run display.brewer.all() to show this:
There are 3 types of palettes, sequential, diverging, and qualitative; each containing from 8 to 12 colors (see data frame brewer.pal.info and help ?RColorBrewer for details).

Curious reader may notice that if a histogram contains 13 or more bars (bins in case of continuous data) we may get in trouble with colors:

ggplot(mtcars) + 
  geom_histogram(aes(factor(hp), fill=factor(hp))) +
  scale_fill_brewer(palette="Set2")


Indeed length(unique(mtcars$hp)) finds 22 unique values for horse power in mtcars, while specified palette Set2 has 8 colors to choose from. Lack of colors in the palette triggers ggplot warnings like this (and invalidates plot as seen above):
1: In brewer.pal(n, pal) :
  n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
RColorBrewer gives us a way to produce larger palettes by interpolating existing ones with constructor function colorRampPalette. It generates functions that do actual job: they build palettes with arbitrary number of colors by interpolating existing palette. To interpolate palette Set1 to 22 colors (number of colors is stored in colourCount variable for examples to follow):

colourCount = length(unique(mtcars$hp))
getPalette = colorRampPalette(brewer.pal(9, "Set1"))
 
ggplot(mtcars) + 
  geom_histogram(aes(factor(hp)), fill=getPalette(colourCount)) + 
  theme(legend.position="right")


While we addressed color deficit other interesting things happened: even though all bars are back and are distinctly colored we lost color legend. I intentionally added theme(legend.position=...) to showcase this fact: despite explicit position request the legend is no more part of the plot.

The difference: fill parameter was moved outside of histogram aes function - this effectively removed fill information from aesthetics data set for ggplot. Hence, there is nothing to apply legend to.

To fix it place fill back into aes and use scale_fill_manual to define custom palette:

ggplot(mtcars) + 
  geom_histogram(aes(factor(hp), fill=factor(hp))) + 
  scale_fill_manual(values = getPalette(colourCount))


Another likely problem with large number of bars in histogram plots is placing of the legend. Adjust legend position and layout using theme and guides functions as follows :

ggplot(mtcars) + 
  geom_histogram(aes(factor(hp), fill=factor(hp))) + 
  scale_fill_manual(values = getPalette(colourCount)) +
  theme(legend.position="bottom") +
  guides(fill=guide_legend(nrow=2))


Finally, the same example using  in place palette constructor with different choice of library palette:

ggplot(mtcars) + 
  geom_histogram(aes(factor(hp), fill=factor(hp))) + 
  scale_fill_manual(values = colorRampPalette(brewer.pal(12, "Accent"))(colourCount)) +
  theme(legend.position="bottom") +
  guides(fill=guide_legend(nrow=2))
Created by Pretty R at inside-R.org


There are quite a few more scale functions to choose from depending on aesthetics type (colour, fill), color types (gradient, hue, etc.), data values (discrete or continuous).

8 comments:

Marcus Beck said...

Excellent, this is exactly what I was looking for! Hopefully this will be implemented in future versions of ggplot.

Asim I said...

Thanks so much for this blog post. This saved me when I had ~20 qualitative series in a stacked area chart.

It'd be helpful if you explicitly stated what packages you had to install/include to get this to work. For reference it is:

library(colorRamps)
library(RColorBrewer)

Gregory Kanevsky said...

Asim, glad it helped - this keeps saving me time too. I am sure I didn't use package colorRamps though - just RColorBrewer - and I have mentioned it (between lines though).

Kate said...

How do I display the custom color palette?

Tomo said...

myPal < - your custom palette

#display
grid::grid.raster(myPal, int=F)

Evan Kontopantelis said...

Excellent, thank you

Piyush Goyal said...

This is excellent. Thanks

Erick said...

Thank you sir! Way better than my classes on R colors!