Saturday, April 16, 2016

Creating and Tweaking Bubble Chart with ggplot2

This article will take us step-by-step over incremental changes to produce a bubble chart using ggplot2 that looks like this:

We'll encounter the plot above once again at the very end after explaining each step with code changes and observing intermediate plots. Without getting into details what it means (curios reader can find out here) the dataset behind is defined as:

It contains 2 data points and 4 attributes: three numerical Aster_experience, R_experience, and  coverage, and one categorical product. Remember that the data won't change a bit while the plot progression unfolds.

The starting plot is simple scatterplot using coordinates x and y as Aster_experience, R_experience (line 3), point size as coverage, and point color as product (line 4) (this type of scatterplot has a special name - bubble chart):

Immediate fix would be making the smaller point big enough to see it with the help of scale_size function and its range argument (line 3) (strange enough but sibling function scale_size_area doesn't have such argument) that specifies the minimum and maximum size of the plotting symbol after transformation1 :

Next refinement aims at the magic quadrant concept which fits this data well. In this case it's "R Experience" vs. "Aster Experience" and whether there is more or less of each. Achieving this effect involves fake axes using geom_hline and geom_vline (line 3), and customizing actual axes using scale (line 5-6) and theme functions (line 8-12):

Typical for bubble charts its points get both colored and labeled, which also makes color bar legend obsolete. We use geom_text to label points (line 5) and scale_color_manual to assign new colors and remove color bar legend (line 11):

The next step happened to tackle the most advanced problem while working on the plot. The guide legend for size above looks rather awkward. Ideally, it matches the two points we have in both color and size. It turned out (and rightly so) that the function scale_size is responsible for its appearance (line 8). In particular, number of legend positions overrides argument breaks, and controling appearance including colors of the legend performed with guide_legend and override.aes:

We finish cleaning the plot using package ggthemes and its theme_tufte function (line 10):

As promised, we finished exactly where we started.

1 Scale size (area or radius).

No comments: