I like to see if I can reproduce figures from papers and books, so I spent most of yesterday trying to reproduce Figure 1.4 with R. I have been partially successful. I am pretty sure I could have made it more quickly with Powerpoint, but then I wouldn't have learned anything about R. Here is the image from the book (I hope it is OK to reproduce this... I will contact to ask permission):
It's an icon plot or an icon array. They can be used to communicate risk.
Here is my attempt to reproduce the icon array using the ggimage and ggwaffle packages:
It's not perfect but it's the best I can do at the moment. I'm not happy with the separation of the rows of icons. The final images are very large and cause R-Studio some problems in terms of speed of rendering. However, I've learned a lot.
A much quicker and easier method uses the personograph package. The types of icon are very restricted - only male icons :-( Also the random distribution of black icons of the original image is not possible. This is supposed to show the random nature of disease which I quite like. The images are very quick to render and the code is easy to understand.
Here are the images made with personograph:
I've also worked with the waffle package as shown in the code below.
Here is all the R code:
## START
# installing the packages, first remove the hash tag to run:
# install.packages("personograph") # easiest way to start
# https://github.com/liamgilbey/ggwaffle
# devtools::install_github("liamgilbey/ggwaffle")
# install.packages("waffle", "readr", "ggpubr")
# https://github.com/GuangchuangYu/ggimage
# setRepositories(ind=1:2)
# install.packages("ggimage")
# first choice of package - nice and easy but limited customisation
library(personograph)
# https://github.com/joelkuiper/personograph
# https://cran.r-project.org/web/packages/personograph/index.html
# the data is supplied in a list and is plotted in order.
data <- list(first=0.06, second=0.94)
personograph(data, colors=list(first="black", second="#efefef"),
fig.title = "100 people who do not eat bacon",
draw.legend = FALSE, dimensions=c(5,20))
data_2 <- list(first=0.06, second=0.01, third=0.93)
personograph(data_2, colors=list(first="black", second="red", third="#efefef"),
fig.title = "100 people who eat bacon every day",
draw.legend = FALSE, dimensions=c(5,20))
# icons all male... no random distribution...
# second package: waffle
library(waffle)
dont_eat_bacon <- c('Cancer' = 6, 'No cancer' = 94)
waffle(dont_eat_bacon, rows = 5, colors = c("#000000", "#efefef"),
legend_pos = "bottom", title = "100 people who do not eat bacon")
eat_bacon <- c('Cancer' = 6, 'Extra case' = 1, 'No cancer' = 93)
waffle(eat_bacon, rows = 5, colors = c("#000000", "#f90000","#efefef"),
legend_pos = "bottom", title = "100 people who eat bacon every day")
# nice clear colours but difficult to change symbols
# without installing fonts into system...
# I would rather not have to install fonts...
# Try a third package - ggwaffle
library(readr)
library(ggwaffle)
library(ggpubr)
library(ggimage)
theme_set(theme_pubr())
# in this case, we basically encode every point as a graph.
# download the data
link <- ("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/ggwaffledata_mf.csv")
bacon_waf <- read_csv(link)
View(bacon_waf)
# basic plot with geom_waffle() from ggwaffle package
ggplot(bacon_waf, aes(x, y, fill = bacon)) +
geom_waffle()
# add icons with geom_icon() from ggimage package
p1 <- ggplot(bacon_waf, aes(x, y, colour = no_bacon)) +
geom_icon(aes(image=icon), size = 0.1) +
scale_color_manual(values=c("black", "grey")) +
theme_waffle() +
theme(legend.position = "none") +
labs(x = "", y = "",
title = "100 people who do not eat bacon")
# show the plot
# this is VERY SLOW to draw...
# because it contains 100 icons each of which gets
# downloaded from the internet.
# I feel sure there is a better way but I don't know it at the moment...
p1
p2 <- ggplot(bacon_waf, aes(x, y, colour = bacon)) +
geom_icon(aes(image=icon), size = 0.1) +
scale_color_manual(values=c("black","red", "grey")) +
theme_waffle() +
theme(legend.position = "none") +
labs(x = "", y = "",
title = "100 people who eat bacon every day")
# this is the text at the bottom of the page
text <- paste("Figure 1.4\n",
"Bacon sandwich example using a pair of icon arrays, with randomly",
"scattered icons showing the incremental risk of eating bacon every",
"day. Of 100 people who do not eat bacon, 6 (solid icons) develop",
"bowel cancer in the normal run of events. Of 100 people who eat",
"bacon every day of their lives, there is 1 additional (red) case.",
sep = " ")
# format the text as ggplot object
# with ggparagraph() from the ggpubr package
text_p <- ggparagraph(text = text, size = 12, color = "black")
# arrange the images and text with ggarrange() from the ggpubr package
together <- ggarrange(p1, p2, text_p,
ncol = 1, nrow = 3,
heights = c(1, 1, 0.3))
together
# AGAIN VERY SLOW...!!
# ggsave("together_3.pdf", together)
# works but these are very large PDF file and
# take a lot of time to render
# at the end it seems good to clear memory...
gc()