On Wednesday, July 20t0 2021, I saw this extended twitter feed by Dr CĂ©line Gounder that reviewed the data and concepts underpinning whether we will need more COVID vaccines. It's a super long thread that is available here through Threadreader too. The thread felt really important and showed lots of data. One of the figures that interested me most is this one from a Lancet paper:
It is a lovely violin plot that shows a decrease in virus neutralisation by variant of the current corona virus (SARS‑CoV‑2). My understanding is that blood samples were taken from various people who had received various vaccines.
I was surprised when I looked at the paper because there was no figures in it. However, the figures are all in the Supplementary Material which you have to download as a PDF to read. The Supplementary Material also had a heart warming phrase: "All
data (anonymised) and full R code to produce all figures and statistical analysis presented in
this manuscript are freely-available online on Github: https://github.com/davidlvb/CrickUCLH-Legacy-VOCs-2021-05"
I was very happy to see the code and data available on Github and I forked the data to see if I could reproduce this violin plot. The short answer was yes. There were some minor challenges. I'm missing the text is at the top of the figure. However, I was able to reproduce the main body of the figure within about 20 minutes of engaging with the process. I was pleased.
Here is the graph I made which is a good reproduction of the one in the Twitter feed and Figure 1B in the supplementary data. Below is some simplified code extracted from the original code that is required to make the Figure 1B.
START
# making just Figure 1B - a violin plot
library(sp)
library(tidyverse)
library(khroma)
# download the data from my fork on Github
github_link <- "https://github.com/brennanpincardiff/Crick-UCLH-Legacy-VOCs-2021-05/blob/main/Crick_Legacy_2021-24-05_B-1-617-2_PUBLIC.Rda?raw=true"
load(url(github_link))
### from Lines 21 to 261 of original script
### Subset data for further analysis
studyData <- dtHashed %>%
filter(COVID_vaccStatus %in% c(1,2)) # Ignore individuals who are in seroconversion window following dose 1 and dose 2 of vaccine
### Set constants for various functions / plots
strainOrder <- c("Wuhan1", "D614G", "Kent", "SAfrica", "India2")
referenceIC50 <- 2^9.802672
dose2cohort <- studyData %>% filter(COVID_vaccStatus == 2, sampleOrderInVaccStatus == 1)
########################################################################
# Panel 1. Vaccine responses per strain following 2nd dose Pfizer #
########################################################################
relevantData <- dose2cohort %>%
pivot_longer(cols = ends_with("ic50"), names_to = "strain", values_to = "ic50")
relevantData$strain <- str_replace_all(relevantData$strain, pattern = "_ic50", replacement = "")
relevantData$strain <- fct_relevel(relevantData$strain, strainOrder)
outplot <- ggplot(relevantData, aes(x=strain, y=ic50, color = strain, label = sample_barcode)) +
scale_colour_muted() +
geom_hline(yintercept = referenceIC50,linetype = 3 ) +
geom_violin(trim=TRUE) +
scale_y_continuous(trans='log2',
breaks=c(5, 10, 64, 256, 1024, 5120),
labels=c("[0]", "[<40]", "64", "256", "1024", "[>2560]"),
minor_breaks=c(32, 128, 512, 2048)) +
ylab(bquote('Virus Neutralisation, '~IC[50]~ '')) +
geom_jitter(shape=20, position=position_jitter(0.2), alpha=0.3) +
stat_summary(fun=median, geom = "point", color="black", shape=5, size=1, stroke=1) +
theme_bw(base_family = "Helvetica Neue Thin") +
theme(legend.position="none")+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=12)) +
theme(axis.text.y = element_text(size=12)) +
theme(
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.title.y=element_text(size=15),
axis.title.x=element_blank()
)
outplot
## END of this SCRIPT
Useful resources
- Making a box and whisker plot with some published proteomic data...
- Original paper: "Neutralising antibody activity against SARS-CoV-2 VOCs B.1.617.2 and B.1.351 by BNT162b2 vaccination" by Wall et al https://doi.org/10.1016/S0140-6736(21)01290-3
- David LV Bauer on Github: https://github.com/davidlvb
- David LV Bauer lab home page
- My fork of the data from the Lancet paper on Github which includes this script