Whenever I need inspiration for effective visualizations, I browse The Economist, the Visual Capitalist, or The Washington Post. During one of these forays, I ran across an interesting infographic — similar to the one shown above — that plotted the age of each member of the US Congress against their generational cohort.
My first impression was that this was a horizontal bar chart, but closer inspection revealed that each bar was composed of multiple markers, making it a scatter plot. Each marker represented one member of Congress.
In this Quick Success Data Science project, we’ll recreate this attractive chart using Python, pandas, and seaborn. Along the way, we’ll unlock a cornucopia of marker types you may not know exist.
Because the United States has Age of Candidacy laws, the birthdays of members of Congress are part of the public record. You can find them in multiple places, including the Biographical Directory of the United States Congress and Wikipedia.
For convenience, I’ve already compiled a CSV file of the names of the current members of Congress, along with their birthdays, branch of government, and party, and stored it in this Gist.
The following code was written in Jupyter Lab and is described by cell.
from collections import defaultdict # For counting members by age.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import patches # For drawing boxes on the plot.
import pandas as pd
import seaborn as sns
Assigning Constants for the Generational Data
We’ll annotate the plot so that generational cohorts, such as Baby Boomers and Gen X, are highlighted. The following code calculates the current age spans for each cohort and includes lists for generation names and highlight colors. Because we want to treat these lists as constants, we’ll capitalize the names and use an underscore as a prefix.
# Prepare generational data for plotting as boxes on chart:
CURRENT_YEAR = 2023…