Monday, 20 August 2018

Exploring more immunization data...

Last week, inspired by Factfulness, I made a graph showing the BCG immunization coverage for children at 1 year. The Factfulness graph didn't mention any specific immunization programme and World Health Organisation data monitors immunization coverage for other vaccines. For that reason, I thought it would be good to download and graph more of the available global immunization data.

This allowed me to generate this graph which shows immunization coverage across the world for nine different vaccines. The various dates that monitoring starts shows that new immunization programs are being rolled out on a regular basis - good to see.




Below is the code for downloading the data and making the graph...
If you would rather just make the graph with some cleaner data, the data from Aug 20, 2018 is available on github and can be downloaded using the read_csv() code shown about half way down the script.

## START
##  download the data  
library(tidyverse)
# install.packages("WHO")
library(WHO)

# check out the codes of the WHO data...
codes <- get_codes()
# get codes for immunizations
immun_codes <- codes[grepl("[Ii]mmuniz", codes$display), ]
immun_codes$label

# go through each of the 18 to find global data...
# Number 1 has global data

# download number 1
# requires internet access
immun_data <- as.tibble(get_data(immun_codes$label[1]))

# filter for global data
immun_data <- filter(immun_data, region == "(WHO) Global")

# repeat download for next 2 to 18 WHO codes  
# had to do this as a loop as couldn't get it to work using lapply...
# start off with second value as first is above..
# requires internet access and patience...
for(i in 2:length(immun_codes$label)){
    # download the data
    data <- as.tibble(get_data(immun_codes$label[i])) 
    
    #tell you that it has downloaded...
    print(paste("Dataset",immun_codes$label[i], "downloaded."))
    
    # filter the data for Global values
    data <- filter(data, region == "(WHO) Global")
    
    # if there is some data bind_rows()
    if(nrow(data)>1){
        # bind_rows() function from dplyr
        immun_data <- bind_rows(immun_data, data)
    }else{   
        # if not just tell us....
        print("No global data in this set")
    }
}





# reduce columns using select() function  
immun_data <- select(immun_data, gho, region, year, value )

# to avoid having to download every time... save a local copy
file_name <- paste0("global_immun_data", Sys.Date())
write_csv(immun_data, file_name)

## ----read_back if you have saved to continue from here
# immun_data <- read_csv(file_name)

# read in data from github using read_csv() function

# immun_data <- read_csv("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/global_immun_data2018-08-20")


# Let's make our plot...
plot <- ggplot(immun_data, aes(x = year, y = value, 
    colour = gho)) +
  geom_line(size = 1)+ 
  theme(legend.position="none")

plot

# separate the plots with facet wrap
plotf <- plot + facet_wrap(~gho)
plotf


## The individual graph titles are difficult to read
# Shorten them by removing text using gsub() = global substitution
immun_data$gho_s <- gsub("immunization coverage among 1-year-olds",
                            "", immun_data$gho)
immun_data$gho_s <- gsub("immunization coverage by the nationally recommended age",
                        "", immun_data$gho_s)

# make the plot again
plot <- ggplot(immun_data, aes(x = year, y = value, 
    colour = gho_s)) +
  geom_line(size = 1) + 
  theme(legend.position="none")

# separate plots with facet_wrap
plotf <- plot + facet_wrap(~gho_s)
plotf <- plotf + theme_bw() + theme(legend.position="none")
plotf

# improve plot with y limits, titles & source
source <- paste("Source: World Health Organisation, accessed:", Sys.Date())
plotf <- plotf + ylim(0,100)
plotf <- plotf + labs(x = NULL, y = "Immunization Rate",
      title = "Global immunization rates", 
    subtitle = source)
plotf



## END

Some Resources:



Friday, 10 August 2018

Exploring some immunization data... one graph from Factfulness..

I've been reading Factfulness - an important book about statistics, data visualization and generally understanding the world. Written by Hans Rosling with Ola Rosling and Anna Rosling Rönnlund of Gapminder, it uses data to try to improve our world view as many of us have an incorrect world view. It is based on the very popular videos by Hans Rosling. Here is one to have a look at...


I like to explore data and there is lots of data in the book. One of the things I like to do is to try to reproduce the graph from publications. So here is a graph from Factfulness:

It shows how the world has improved in terms of immunization. Most of us don't know that almost 90% of children worldwide are immunized.  Here is a graph for just the TB vaccine - data for BCG immunization uptake for children. It's similar to the graph above and it's made in R.



The code uses the WHO R package which allows us to access immunization data from the Global Health Observatory data repository.

Here is the code that makes the graph and some of the steps along the way.

START 

library(tidyverse)
# install.packages("WHO")
library(WHO)
# check out the codes of the WHO data...
# requires internet access
codes <- get_codes()
colnames(codes)



glimpse(codes)  # useful function from tibble package

codes[1:10, 2]

# get codes for immunizations
# search using grepl() function for finding regular expressions
codes[grepl("[Ii]mmuniz", codes$display), ]

# ----downloadBCG data
bcg_data <- get_data("WHS4_543")  # BCG data...
# requires internet access and takes a few minutes...

## ----lookatBCG data
glimpse(bcg_data)

summary(bcg_data)

## ----globalGraph
# extract Global data with the filter() function from dplyr package
# then plot...
bcg_data %>% 
    filter(region == "(WHO) Global") %>% 
    ggplot(aes(x = year, y = value)) +
    geom_line(size = 1) +
    labs(x = NULL, y = "BCG Immunization Rates")





## To make the graph look more like Factfulness...
# 1. Add a title
# 2. Set limits to 0 and 100%
# 3. Add source
# 4. Add an point at the start
# 5. Add an arrow at the end
# 6. Add values and years to start and end...
# 7. To change the style to grey under the arrow

## add titles and limits to y axis...
bcg_data %>% 
    filter(region == "(WHO) Global") %>% 
    ggplot(aes(x = year, y = value)) +
    geom_line(size = 1) +
    labs(x = NULL, y = "BCG Immunization Rates", 
        title = "BCG Immunization Rates") + 
    ylim(0,100) -> bcg_plot
bcg_plot





# add a source
source <- paste("Source: World Health Organisation \n accessed:",     Sys.Date())
bcg_plot + annotate("text",  x = 2008, y = 10, label=source, size=3)




## Add points and annotation requires more work....
# pull out global data
bcg_data %>% 
    filter(region == "(WHO) Global") -> glob_bcg
# identify minimum and maximum point and labels
min_point <- filter(glob_bcg, year == min(glob_bcg$year))
min_point_label <- paste0(min_point$value, "%\n", min_point$year)
max_point <- filter(glob_bcg, year == max(glob_bcg$year))
max_point_label <- paste0(max_point$value, "%\n", max_point$year)

# Use WHO title - better plan
graph_title <- glob_bcg$gho[1]

# put it together to make a graph
bcg_plot <- ggplot(glob_bcg, aes(x = year, y = value)) +
    geom_line(size = 1) +
    labs(x = NULL, y = "Immunization Rate",
        title = graph_title) + 
    ylim(0,100) +
    geom_point(data = min_point, aes(year, value)) +
    geom_text(data = min_point, label=min_point_label, hjust=-0.8) +
    geom_point(data = max_point, aes(year, value), 
        shape = 62, size = 5) +
    geom_text(data = max_point, label=max_point_label, vjust=1.2)

bcg_plot




## add grey underneath the line
# this uses geom_ribbon()
bcg_plot + geom_ribbon(aes(ymin=0, ymax=value), 
    fill = "#DCDCDC") # this colour is a nice grey. 



# works but overlays earlier annotation. 

## ----change order to make it all work together for final plot
bcg_plot <- ggplot(glob_bcg, aes(x = year, y = value)) +
    geom_ribbon(aes(ymin=0, ymax=value), fill = "#DCDCDC") +
    geom_line(size = 1) +
    theme_bw() +
    labs(x = NULL, y = "Immunization Rate",
        title = graph_title) + 
    ylim(0,100) +
    geom_point(data = min_point, aes(year, value), size = 5) +
    geom_text(data = min_point, label=min_point_label, hjust=-0.8) +
    geom_point(data = max_point, aes(year, value), 
        shape = 62, size = 8) +
    geom_text(data = max_point, label=max_point_label, vjust=1.2) +
    annotate("text",  x = 2008, y = 10, label=source, size=3)

bcg_plot



END


Some Resources: