Tuesday, 16 April 2019

An icon plot inspired by bacon and a new book....

Last Saturday, "The Art of Statistics - Learning from Data" by Professor Sir David Spiegelhalter arrived. I ordered it after watching this video entitled "Why statistics should make you suspicious". It is a very interesting book which discusses statistics, data visualisation, data science and the challenges of teaching statistics.

I like to see if I can reproduce figures from papers and books, so I spent most of yesterday trying to reproduce Figure 1.4 with R. I have been partially successful. I am pretty sure I could have made it more quickly with Powerpoint, but then I wouldn't have learned anything about R. Here is the image from the book (I hope it is OK to reproduce this... I will contact to ask permission):


It's an icon plot or an icon array. They can be used to communicate risk.

Here is my attempt to reproduce the icon array using the ggimage and  ggwaffle packages:





It's not perfect but it's the best I can do at the moment. I'm not happy with the separation of the rows of icons. The final images are very large and cause R-Studio some problems in terms of speed of rendering. However, I've learned a lot.

A much quicker and easier method uses the personograph package. The types of icon are very restricted - only male icons :-( Also the random distribution of black icons of the original image is not possible. This is supposed to show the random nature of disease which I quite like. The images are very quick to render and the code is easy to understand.

Here are the images made with personograph:
Figure 1.4
Bacon sandwich example using a pair of icon arrays. Of 100 people who do not eat 
bacon, 6 (black icons) develop bowel cancer in the normal run of events (top panel).
Of 100 people who eat bacon every day of their lives, there is 1 additional (red) case.
I've also worked with the waffle package as shown in the code below.


Here is all the R code:

## START
# installing the packages, first remove the hash tag to run:

# install.packages("personograph") # easiest way to start

# https://github.com/liamgilbey/ggwaffle
# devtools::install_github("liamgilbey/ggwaffle")

# install.packages("waffle", "readr", "ggpubr")

# https://github.com/GuangchuangYu/ggimage
# setRepositories(ind=1:2)
# install.packages("ggimage")


# first choice of package - nice and easy but limited customisation
library(personograph)
# https://github.com/joelkuiper/personograph
# https://cran.r-project.org/web/packages/personograph/index.html

# the data is supplied in a list and is plotted in order. 
data <- list(first=0.06, second=0.94)
personograph(data,  colors=list(first="black", second="#efefef"),
    fig.title = "100 people who do not eat bacon",
    draw.legend = FALSE, dimensions=c(5,20))


data_2 <- list(first=0.06, second=0.01, third=0.93)
personograph(data_2, colors=list(first="black", second="red", third="#efefef"),
    fig.title = "100 people who eat bacon every day",
    draw.legend = FALSE,  dimensions=c(5,20))



# icons all male... no random distribution...

# second package: waffle
library(waffle)
dont_eat_bacon <- c('Cancer' = 6, 'No cancer' = 94)
waffle(dont_eat_bacon, rows = 5, colors = c("#000000", "#efefef"),
    legend_pos = "bottom", title = "100 people who do not eat bacon")


eat_bacon <- c('Cancer' = 6, 'Extra case' = 1, 'No cancer' = 93)
waffle(eat_bacon, rows = 5, colors = c("#000000", "#f90000","#efefef"),
    legend_pos = "bottom", title = "100 people who eat bacon every day")




# nice clear colours but difficult to change symbols
# without installing fonts into system...
# I would rather not have to install fonts...

# Try a third package - ggwaffle

library(readr)
library(ggwaffle)
library(ggpubr)
library(ggimage)
theme_set(theme_pubr())

# in this case, we basically encode every point as a graph. 
# download the data
link <- ("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/ggwaffledata_mf.csv")
bacon_waf <- read_csv(link)

# have a look at data
View(bacon_waf)

# basic plot with geom_waffle() from ggwaffle package
ggplot(bacon_waf, aes(x, y, fill = bacon)) + 
    geom_waffle()

# add icons with geom_icon() from ggimage package
p1 <- ggplot(bacon_waf, aes(x, y, colour = no_bacon)) + 
    geom_icon(aes(image=icon), size = 0.1) +
    scale_color_manual(values=c("black", "grey")) +
    theme_waffle()  +
    theme(legend.position = "none") +
    labs(x = "", y = "",
        title = "100 people who do not eat bacon")

# show the plot
# this is VERY SLOW to draw...
# because it contains 100 icons each of which gets
# downloaded from the internet. 
# I feel sure there is a better way but I don't know it at the moment...
p1


p2 <- ggplot(bacon_waf, aes(x, y, colour = bacon)) + 
    geom_icon(aes(image=icon), size = 0.1) +
    scale_color_manual(values=c("black","red", "grey")) +
    theme_waffle()  +
    theme(legend.position = "none") +
    labs(x = "", y = "",
        title = "100 people who eat bacon every day")

# this is the text at the bottom of the page
text <- paste("Figure 1.4\n",
"Bacon sandwich example using a pair of icon arrays, with randomly",
"scattered icons showing the incremental risk of eating bacon every",
"day. Of 100 people who do not eat bacon, 6 (solid icons) develop",
"bowel cancer in the normal run of events. Of 100 people who eat",
"bacon every day of their lives, there is 1 additional (red) case.",
sep = " ")

# format the text as ggplot object
# with ggparagraph() from the ggpubr package
text_p <- ggparagraph(text = text, size = 12, color = "black")

# arrange the images and text with ggarrange() from the ggpubr package
together <- ggarrange(p1, p2, text_p, 
    ncol = 1, nrow = 3,
    heights = c(1, 1, 0.3))

together
# AGAIN VERY SLOW...!!

# ggsave("together_3.pdf", together)
# works but these are very large PDF file and
# take a lot of time to render

# at the end it seems good to clear memory...

gc()

## END

Some resources:
  • Reproducing immunization data from Factfulness - another good book from Hans Rosling.  
  • Spiegelhalter, DJ (2008) "Understanding Uncertainty" Ann Fam Med 6:196-197 doi: 10.1370/afm.848
  • Galesic, M & Garcia-Retamero, R (2009) "Using Icon Arrays to Communicate Medical Risks: Overcoming Low NumeracyHealth Psychology  2009, Vol. 28, No. 2, 210 –216  
  • "Infographic-style charts using the R waffle package" by N Saunders