Data Rich Reports

Session 4–Visualizing Data

Bar Charts

Retreive Code

  • Go here to download the code
  • Download the whole folder
  • Be sure to unzip if you are a Windows user

The Grammar of Graphics

  • Data viz has a language with its own grammar
  • Basic components include:
    • Data we are trying to visualize
    • Aesthetics (dimensions)
    • Geom (e.g. bar, line, scatter plot)
    • Color scales
    • Themes
    • Annotations


Let’s load our packages and import our data…


library(readr)
library(ggplot2)

dem_summary <- read_csv("dem_summary.csv")

ggplot(dem_summary, aes(x = region, y = polyarchy)) 


And then let’s start with the first two elements, the data and the aesthetic…


library(readr)
library(ggplot2)

dem_summary <- read_csv("dem_summary.csv")

ggplot(dem_summary, aes(x = region, y = polyarchy)) 

This gives us the axes without any visualization:


Now let’s add a geom. In this case we want a bar chart so we add geom_col().


ggplot(dem_summary, aes(x = region, y = polyarchy)) + 
  geom_col()

That gets the idea across but looks a little depressing, so…


…let’s change the color of the bars by specifying fill = "steelblue".


ggplot(dem_summary, aes(x = region, y = polyarchy)) + 
  geom_col(fill = "steelblue")

Note how color of original bars is simply overwritten:


Now let’s add some labels with the labs() function:


ggplot(dem_summary, aes(x = region, y = polyarchy)) + 
  geom_col(fill = "steelblue") +
  labs(
    x = "Region", 
    y = "Avg. Polyarchy Score", 
    title = "Democracy by region, 1990 - present", 
    caption = "Source: V-Dem Institute"
    )

And that gives us…

Next, we reorder the bars with fct_reorder() from the forcats package.


library(forcats)

ggplot(dem_summary, aes(x = fct_reorder(region, -polyarchy), y = polyarchy)) +
  geom_col(fill = "steelblue") + 
  labs(
    x = "Region", 
    y = "Avg. Polyarchy Score", 
    title = "Democracy by region, 1990 - present", 
    caption = "Source: V-Dem Institute"
    )


Note that we could also use the base R reorder() function here.

This way, we get a nice, visually appealing ordering of the bars according to levels of democracy…


Now let’s change the theme to theme_minimal().


ggplot(dem_summary, aes(x = reorder(region, -polyarchy), y = polyarchy)) +
  geom_col(fill = "steelblue") + 
  labs(
    x = "Region", 
    y = "Avg. Polyarchy Score", 
    title = "Democracy by region, 1990 - present", 
    caption = "Source: V-Dem Institute"
    ) + theme_minimal()

Gives us a clean, elegant look.


Note that you can also save your plot as an object to modify later.


dem_bar_chart <- ggplot(dem_summary, aes(x = reorder(region, -polyarchy), y = polyarchy)) +
  geom_col(fill = "steelblue")

Which gives us…

dem_bar_chart


Now let’s add back our labels…


dem_bar_chart <- dem_bar_chart +
  labs(
    x = "Region", 
    y = "Avg. Polyarchy Score", 
    title = "Democracy by region, 1990 - present", 
    caption = "Source: V-Dem Institute"
    )

So now we have…

dem_bar_chart


And now we’ll add back our theme…


dem_bar_chart <- dem_bar_chart + theme_minimal()

Voila!

dem_bar_chart

Change the theme. There are many themes to choose from.

dem_bar_chart + theme_bw()

Your Turn!

  1. glimpse() the data
  2. Find a new variable to visualize
  3. Make a bar chart with it
  4. Change the color of the bars
  5. Order the bars
  6. Add labels
  7. Add a theme
  8. Try saving your plot as an object
  9. Then change the labels and/or theme
10:00

Line Charts

Line Chart Setup

library(vdemdata)
library(tidyverse)

dem_waves_ctrs <- vdem |>
  select(
    country = country_name,     
    year, 
    polyarchy = v2x_polyarchy, 
  ) |>
  filter( 
    country %in% c("United States of America", # select countries in this list
                   "Japan", 
                   "Portugal")
    )
02:00

Line Chart


Here is the code…


# in this ggplot() call, we add a third dimension for line color
ggplot(dem_waves_ctrs, aes(x = year, y = polyarchy, color = country)) +
  geom_line(linewidth = 1) + # our geom is a line with a width of 1
  labs(
    x = "Year", 
    y = "Polyarchy Score", 
    title = 'Democracy in countries representing three different "waves"', 
    caption = "Source: V-Dem Institute", 
    color = "Country" # make title of legend to upper case
  )


Use geom_line() to specify a line chart…


# in this ggplot() call, we add a third dimension for line color
ggplot(dem_waves_ctrs, aes(x = year, y = polyarchy, color = country)) +
  geom_line(linewidth = 1) + # our geom is a line with a width of 1
  labs(
    x = "Year", 
    y = "Polyarchy Score", 
    title = 'Democracy in countries representing three different "waves"', 
    caption = "Source: V-Dem Institute", 
    color = "Country" # make title of legend to upper case
  )


Add third dimension to the aes() call for line color…


# in this ggplot() call, we add a third dimension for line color
ggplot(dem_waves_ctrs, aes(x = year, y = polyarchy, color = country)) +
  geom_line(linewidth = 1) + # our geom is a line with a width of 1
  labs(
    x = "Year", 
    y = "Polyarchy Score", 
    title = 'Democracy in countries representing three different "waves"', 
    caption = "Source: V-Dem Institute", 
    color = "Country" # make title of legend to upper case
  )


Modify the legend title…


# in this ggplot() call, we add a third dimension for line color
ggplot(dem_waves_ctrs, aes(x = year, y = polyarchy, color = country)) +
  geom_line(linewidth = 1) + # our geom is a line with a width of 1
  labs(
    x = "Year", 
    y = "Polyarchy Score", 
    title = 'Democracy in countries representing three different "waves"', 
    caption = "Source: V-Dem Institute", 
    color = "Country" # make title of legend to upper case
  )

Problem

Color Blindness


  • Color Vision Deficiency (CVD) or color blindness affects 8 percent of men and 1 in 200 women
  • There are different types of CVD but most common is red-green color blindness
  • Therefore, don’t include red and green in the same chart!
  • Look for color blind safe palettes

Solution: Use a colorblind safe color scheme like viridis


Use scale_color_viridis_d() in this case to specify the viridis color scheme…

# in this ggplot() call, we add a third dimension for line color
ggplot(dem_waves_ctrs, aes(x = year, y = polyarchy, color = country)) +
  geom_line(linewidth = 1) + # our geom is a line with a width of 1
  labs(
    x = "Year", 
    y = "Polyarchy Score", 
    title = 'Democracy in countries representing three different "waves"', 
    caption = "Source: V-Dem Institute", 
    color = "Country" # make title of legend to upper case
  ) +
  scale_color_viridis_d(option = "mako", end = .8) # use viridis color palette

Better!

Palettes


  • There are a number of viridis palettes
  • See this reference to view different palettes and options
  • You can also use scale_color_viridis_c() to specify a continuous color scale
  • Also check out the paletteer package for easy access to many more palettes

Your Turn!


  • See table three of this article
  • Select three countries to visualize
  • Adjust setup code to filter data on those countries
  • Visualize with geom_line()
  • Use scale_color_viridis_d() to specify a viridis color scheme
10:00

Bonus Material

Superfun Data Visualization

Superfun Data Visualization


library(gganimate)
library(gapminder)

ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, colour = country)) +
  geom_point(alpha = 0.7, show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') +
  transition_time(year) +
  ease_aes('linear')

Superfun Data Visualization


library(gganimate)
library(gapminder)

ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, colour = country)) +
  geom_point(alpha = 0.7, show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') +
  transition_time(year) +
  ease_aes('linear')

Extended vdemdata Example

# Load packages
library(vdemdata) # to download V-Dem data
library(dplyr)

# Download the data
democracy <- vdem |> # download the V-Dem dataset
  filter(year == 2015)  |> # filter year, keep 2015
  select(                  # select (and rename) these variables
    country = country_name,     # the name before the = sign is the new name  
    vdem_ctry_id = country_id,  # the name after the = sign is the old name
    year, 
    polyarchy = v2x_polyarchy, 
    gdp_pc = e_gdppc, 
    region = e_regionpol_6C
    ) |>
  mutate(
    region = case_match(region, # replace the values in region with country names
                     1 ~ "Eastern Europe", 
                     2 ~ "Latin America",  
                     3 ~ "Middle East",   
                     4 ~ "Africa", 
                     5 ~ "The West", 
                     6 ~ "Asia")
  )

# View the data
glimpse(democracy)

Use filter() to select years…

# Download the data
democracy <- vdem |> 
  filter(year == 2015)  |> # keep 2015
  select(                 
    country = country_name,       
    vdem_ctry_id = country_id,  
    year, 
    polyarchy = v2x_polyarchy, 
    gdp_pc = e_gdppc, 
    region = e_regionpol_6C
    ) |>
  mutate(
    region = case_match(region,
                     1 ~ "Eastern Europe", 
                     2 ~ "Latin America",  
                     3 ~ "Middle East",   
                     4 ~ "Africa", 
                     5 ~ "The West", 
                     6 ~ "Asia")
  )

Use select() to choose variables…

# Download the data
democracy <- vdem |> 
  filter(year == 2015)  |> 
  select(                  # select (and rename) these variables
    country = country_name,     # the name before the = sign is the new name  
    vdem_ctry_id = country_id,  # the name after the = sign is the old name
    year, 
    polyarchy = v2x_polyarchy, 
    gdp_pc = e_gdppc, 
    region = e_regionpol_6C
    ) |>
  mutate(
    region = case_match(region, 
                     1 ~ "Eastern Europe", 
                     2 ~ "Latin America",  
                     3 ~ "Middle East",   
                     4 ~ "Africa", 
                     5 ~ "The West", 
                     6 ~ "Asia")
  )

Use mutate with case_match() to Recode Region….

# Download the data
democracy <- vdem |>
  filter(year == 2015)  |> 
  select(                  
    country = country_name,     
    vdem_ctry_id = country_id,  
    year, 
    polyarchy = v2x_polyarchy, 
    gdp_pc = e_gdppc, 
    region = e_regionpol_6C
    ) |>
  mutate(
    region = case_match(region, # replace the values in region with country names
                     1 ~ "Eastern Europe", 
                     2 ~ "Latin America",  
                     3 ~ "Middle East",   
                     4 ~ "Africa", 
                     5 ~ "The West", 
                     6 ~ "Asia")
                    # number on the left of the ~ is the V-Dem region code
                    # we are changing the number to the country name on the right
                    # of the equals sign
  )

Visualize It!

Visualize It!

library(ggplot2)

ggplot(democracy, aes(x = gdp_pc, y = polyarchy)) + 
  geom_point(aes(color = region)) + 
  geom_smooth(method = "lm", linewidth = 1) + 
  scale_x_log10(labels = scales::label_number(prefix = "$", suffix = "k")) +
  labs(
    x= "GDP per Capita", 
    y = "Polyarchy Score",
    title = "Wealth and democracy in 2015", 
    caption = "Source: V-Dem Institute", 
    color = "Region"
    ) +
  scale_color_viridis_d(option = "inferno", end = .8)

Try it Yourself

  • Go to the V-Dem Codebook
  • Select a democracy indicator from Part 2.1 (high level indicators) to visualize
  • Note the indicator code (e.g. “v2x_polyarchy” for the polyarchy score)
  • Change the code and download the data so you can visualize it
  • Now make a scatter plot of your indicator versus GDP

Coding Assignment 1

Coding Assignment 1


  • Let’s get started on the first assignment
  • Instructions for Coding Assignment 1 are here
  • Due by 11:59pm on Sunday, February 25