Friday 10 August 2018

Exploring some immunization data... one graph from Factfulness..

I've been reading Factfulness - an important book about statistics, data visualization and generally understanding the world. Written by Hans Rosling with Ola Rosling and Anna Rosling Rönnlund of Gapminder, it uses data to try to improve our world view as many of us have an incorrect world view. It is based on the very popular videos by Hans Rosling. Here is one to have a look at...


I like to explore data and there is lots of data in the book. One of the things I like to do is to try to reproduce the graph from publications. So here is a graph from Factfulness:

It shows how the world has improved in terms of immunization. Most of us don't know that almost 90% of children worldwide are immunized.  Here is a graph for just the TB vaccine - data for BCG immunization uptake for children. It's similar to the graph above and it's made in R.



The code uses the WHO R package which allows us to access immunization data from the Global Health Observatory data repository.

Here is the code that makes the graph and some of the steps along the way.

START 

library(tidyverse)
# install.packages("WHO")
library(WHO)
# check out the codes of the WHO data...
# requires internet access
codes <- get_codes()
colnames(codes)



glimpse(codes)  # useful function from tibble package

codes[1:10, 2]

# get codes for immunizations
# search using grepl() function for finding regular expressions
codes[grepl("[Ii]mmuniz", codes$display), ]

# ----downloadBCG data
bcg_data <- get_data("WHS4_543")  # BCG data...
# requires internet access and takes a few minutes...

## ----lookatBCG data
glimpse(bcg_data)

summary(bcg_data)

## ----globalGraph
# extract Global data with the filter() function from dplyr package
# then plot...
bcg_data %>% 
    filter(region == "(WHO) Global") %>% 
    ggplot(aes(x = year, y = value)) +
    geom_line(size = 1) +
    labs(x = NULL, y = "BCG Immunization Rates")





## To make the graph look more like Factfulness...
# 1. Add a title
# 2. Set limits to 0 and 100%
# 3. Add source
# 4. Add an point at the start
# 5. Add an arrow at the end
# 6. Add values and years to start and end...
# 7. To change the style to grey under the arrow

## add titles and limits to y axis...
bcg_data %>% 
    filter(region == "(WHO) Global") %>% 
    ggplot(aes(x = year, y = value)) +
    geom_line(size = 1) +
    labs(x = NULL, y = "BCG Immunization Rates", 
        title = "BCG Immunization Rates") + 
    ylim(0,100) -> bcg_plot
bcg_plot





# add a source
source <- paste("Source: World Health Organisation \n accessed:",     Sys.Date())
bcg_plot + annotate("text",  x = 2008, y = 10, label=source, size=3)




## Add points and annotation requires more work....
# pull out global data
bcg_data %>% 
    filter(region == "(WHO) Global") -> glob_bcg
# identify minimum and maximum point and labels
min_point <- filter(glob_bcg, year == min(glob_bcg$year))
min_point_label <- paste0(min_point$value, "%\n", min_point$year)
max_point <- filter(glob_bcg, year == max(glob_bcg$year))
max_point_label <- paste0(max_point$value, "%\n", max_point$year)

# Use WHO title - better plan
graph_title <- glob_bcg$gho[1]

# put it together to make a graph
bcg_plot <- ggplot(glob_bcg, aes(x = year, y = value)) +
    geom_line(size = 1) +
    labs(x = NULL, y = "Immunization Rate",
        title = graph_title) + 
    ylim(0,100) +
    geom_point(data = min_point, aes(year, value)) +
    geom_text(data = min_point, label=min_point_label, hjust=-0.8) +
    geom_point(data = max_point, aes(year, value), 
        shape = 62, size = 5) +
    geom_text(data = max_point, label=max_point_label, vjust=1.2)

bcg_plot




## add grey underneath the line
# this uses geom_ribbon()
bcg_plot + geom_ribbon(aes(ymin=0, ymax=value), 
    fill = "#DCDCDC") # this colour is a nice grey. 



# works but overlays earlier annotation. 

## ----change order to make it all work together for final plot
bcg_plot <- ggplot(glob_bcg, aes(x = year, y = value)) +
    geom_ribbon(aes(ymin=0, ymax=value), fill = "#DCDCDC") +
    geom_line(size = 1) +
    theme_bw() +
    labs(x = NULL, y = "Immunization Rate",
        title = graph_title) + 
    ylim(0,100) +
    geom_point(data = min_point, aes(year, value), size = 5) +
    geom_text(data = min_point, label=min_point_label, hjust=-0.8) +
    geom_point(data = max_point, aes(year, value), 
        shape = 62, size = 8) +
    geom_text(data = max_point, label=max_point_label, vjust=1.2) +
    annotate("text",  x = 2008, y = 10, label=source, size=3)

bcg_plot



END


Some Resources:

No comments:

Post a Comment

Comments and suggestions are welcome.