Small multiples vs. animated GIFs for showing changes in fertility rates over time

A couple weeks ago, Stephen Holzman shared an animated GIF on /r/DataIsBeautiful that caught my eye. The GIF showed the evolution of fertility rates of the U.S. and Japan between 1947 and 2010, which starts right in the middle of the post-WWII Baby Boom and follows the gradual decline of Japan’s fertility rates, which has led to somewhat of a population crisis for Japan.

usa-vs-japan-fertility

Although Stephen’s GIF is fun to watch — especially because the animation gives the appearance of waves rising and falling — I couldn’t help but be frustrated by the limitations of GIFs in data visualization. If we wanted to compare the fertility rates of 1980 and 2010, for example, I’d have to keep a mental snapshot of what the 1980 frame looked like for when the 2010 frame came around. Animated GIFs are really only useful for showing broad trends, and all of the richness of the data is lost when we’re forced to process each frame so quickly.

This drawback is the exact reason that small multiples were introduced to data visualization: If we’re comparing the same data in the same format between several different [times|treatments|countries|etc.], then we can visualize the data on the same scale and axes to make them easily comparable.

I’ve long been a proponent of small multiples over GIFs, so I took Stephen’s data (which is actually from the Human Fertility Database) and reworked it into small multiples. You can click on the image for a super-high-res version.

usa-vs-japan-fertility-rates-small-multiple

Each year gets its own plot — running from left to right — with both country’s fertility rates plotted. The total fertility rate for each year is annotated onto its corresponding plot, and color-coded according to the country. I plotted the x-axis tick labels to show the reader the age range of the plots, but only on the top and bottom rows to avoid too much repetition. Similarly, the y-axis tick labels only appear on the plots on the left.

Now it’s straightforward to compare across and within decades: 1947, 1955, 1963, etc. can easily be compared by looking down the columns. By the same token, 1947, 1948, 1949, etc. can easily be compared by looking down the rows. I also could’ve lined up the years by decade — making the beginning of each decade (1960, 1970, etc.) appear on the left — but I didn’t see a strong reason to break the symmetry of the grid (or subset the data to 1950-2009) in this case.

Of course, the drawback of small multiples is that you no longer see the data in the same detail as you did with the larger plots. Out of necessity, each plot in a small multiples chart must be small, simple, and have few axis ticks, which can make small multiples a poor choice if you’re making a comparison where there has been little change.

What do you think? Did small multiples make it easier to draw insights from the Human Fertility Database?

Code for the small multiples visualization

I can’t share the data that I used to create this visualization — you’ll have to get it from the Human Fertility Database — but here’s the Python code I used to generate the small multiples visualization.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

plt.style.use('https://gist.githubusercontent.com/rhiever/d0a7332fe0beebfdc3d5/raw/'
              '223d70799b48131d5ce2723cd5784f39d7a3a653/tableau10.mplstyle')

# "japan_fertility" and "usa_fertility" are pandas DataFrames with
# the fertility data from the Human Fertility Database

plt.figure(figsize=(12, 16))

for plot_num, ((index_japan, japan), (index_usa, usa)) in enumerate(
                                                    zip(japan_fertility.groupby('Year'),
                                                    usa_fertility.groupby('Year'))):
    ax = plt.subplot(8, 8, plot_num + 1)
    plt.fill_between(usa.Age.values, usa.ASFR.values, color='#1f77b4', alpha=0.7)
    plt.fill_between(japan.Age.values, japan.ASFR.values, color='#d62728', alpha=0.7)
    plt.xlim(9, 51)
    plt.ylim(0, 0.3)
    
    if index_japan = 2003:
        plt.xticks(range(10, 51, 20), fontsize=10)
    else:
        plt.xticks(range(10, 51, 20), [''])
        
    if plot_num % 8 == 0:
        plt.yticks(np.arange(0.1, 0.31, 0.1), fontsize=10)
    else:
        plt.yticks(np.arange(0.1, 0.31, 0.1), [''])

    plt.text(40, 0.26, usa.ASFR.sum().round(2), fontsize=10,
             ha='center', color='#1f77b4')
    plt.text(40, 0.225, japan.ASFR.sum().round(2), fontsize=10,
             ha='center', color='#d62728')
    plt.title(index_japan, fontsize=10)
    
plt.tight_layout()
plt.savefig('usa-vs-japan-fertility-rates-small-multiple.pdf', dpi=300)

Note that I had to add the plot axis labels, the plot title, and a couple annotations manually.

Posted in Posts from feeds Tagged with: , , , , , ,