I like to explore data and there is lots of data in the book. One of the things I like to do is to try to reproduce the graph from publications. So here is a graph from Factfulness:
It shows how the world has improved in terms of immunization. Most of us don't know that almost 90% of children worldwide are immunized. Here is a graph for just the TB vaccine - data for BCG immunization uptake for children. It's similar to the graph above and it's made in R.
The code uses the WHO R package which allows us to access immunization data from the Global Health Observatory data repository.
Here is the code that makes the graph and some of the steps along the way.
START
library(tidyverse)
# install.packages("WHO")
library(WHO)
# check out the codes of the WHO data...
# requires internet access
codes <- get_codes()
colnames(codes)
glimpse(codes) # useful function from tibble package
codes[1:10, 2]
# get codes for immunizations
# search using grepl() function for finding regular expressions
codes[grepl("[Ii]mmuniz", codes$display), ]
# ----downloadBCG data
bcg_data <- get_data("WHS4_543") # BCG data...
# requires internet access and takes a few minutes...
## ----lookatBCG data
glimpse(bcg_data)
summary(bcg_data)
## ----globalGraph
# extract Global data with the filter() function from dplyr package
# then plot...
bcg_data %>%
filter(region == "(WHO) Global") %>%
ggplot(aes(x = year, y = value)) +
geom_line(size = 1) +
labs(x = NULL, y = "BCG Immunization Rates")
## To make the graph look more like Factfulness...
# 1. Add a title
# 2. Set limits to 0 and 100%
# 3. Add source
# 4. Add an point at the start
# 5. Add an arrow at the end
# 6. Add values and years to start and end...
# 7. To change the style to grey under the arrow
## add titles and limits to y axis...
bcg_data %>%
filter(region == "(WHO) Global") %>%
ggplot(aes(x = year, y = value)) +
geom_line(size = 1) +
labs(x = NULL, y = "BCG Immunization Rates",
title = "BCG Immunization Rates") +
ylim(0,100) -> bcg_plot
bcg_plot
# add a source
source <- paste("Source: World Health Organisation \n accessed:", Sys.Date())
bcg_plot + annotate("text", x = 2008, y = 10, label=source, size=3)
## Add points and annotation requires more work....
# pull out global data
bcg_data %>%
filter(region == "(WHO) Global") -> glob_bcg
# identify minimum and maximum point and labels
min_point <- filter(glob_bcg, year == min(glob_bcg$year))
min_point_label <- paste0(min_point$value, "%\n", min_point$year)
max_point <- filter(glob_bcg, year == max(glob_bcg$year))
max_point_label <- paste0(max_point$value, "%\n", max_point$year)
# Use WHO title - better plan
graph_title <- glob_bcg$gho[1]
# put it together to make a graph
bcg_plot <- ggplot(glob_bcg, aes(x = year, y = value)) +
geom_line(size = 1) +
labs(x = NULL, y = "Immunization Rate",
title = graph_title) +
ylim(0,100) +
geom_point(data = min_point, aes(year, value)) +
geom_text(data = min_point, label=min_point_label, hjust=-0.8) +
geom_point(data = max_point, aes(year, value),
shape = 62, size = 5) +
geom_text(data = max_point, label=max_point_label, vjust=1.2)
bcg_plot
## add grey underneath the line
# this uses geom_ribbon()
bcg_plot + geom_ribbon(aes(ymin=0, ymax=value),
fill = "#DCDCDC") # this colour is a nice grey.
# works but overlays earlier annotation.
## ----change order to make it all work together for final plot
bcg_plot <- ggplot(glob_bcg, aes(x = year, y = value)) +
geom_ribbon(aes(ymin=0, ymax=value), fill = "#DCDCDC") +
geom_line(size = 1) +
theme_bw() +
labs(x = NULL, y = "Immunization Rate",
title = graph_title) +
ylim(0,100) +
geom_point(data = min_point, aes(year, value), size = 5) +
geom_text(data = min_point, label=min_point_label, hjust=-0.8) +
geom_point(data = max_point, aes(year, value),
shape = 62, size = 8) +
geom_text(data = max_point, label=max_point_label, vjust=1.2) +
annotate("text", x = 2008, y = 10, label=source, size=3)
bcg_plot
END
No comments:
Post a Comment
Comments and suggestions are welcome.