Monday, 1 July 2019

Exploring data from Nature journal about fibroblast subsets in arthritis...

Recently, I attended a conference entitled Data-driven Systems Medicine at Cardiff University. An interesting meeting organised by Dr Barbara Szomolay & Dr Tom Connor who work in the Systems Immunity Institute.

Prof. Mark Coles, Professor at Kennedy Institute of Rheumatology at Oxford University talked about a recent paper published in Nature. The title of the paper was "Distinct fibroblast subsets drive inflammation and damage in arthritis" by Croft et al . A Biorxiv of the manuscript is also available. Nature requires authors to make a "Data availability statement" which encourages authors to share their data.

This gave me the opportunity to explore some of the data in R. It was relatively straight forward workflow: import, tidy and visualise. To make a figure exactly like the paper takes a bit more effort. 

To quote from the paper:
"Rheumatoid arthritis (RA) is a prototypic IMID6 in which synovial fibroblasts contribute to both joint damage and inflammation. We found that expression of FAPĪ±, a cell-membrane dipeptidyl peptidase10, was significantly higher in both synovial tissue and cultured synovial fibroblasts isolated from patients who fulfilled classification criteria for RA compared to patients in whom joint inflammation resolved (Fig. 1a–c), suggesting that FAPĪ± expression may associate with a pathogenic fibroblast phenotype."

Here is a version of Figure 1b which shows FAPalpha in patient samples with different types of disease. It a quantification of staining of cells in synovial tissue.

The code to make this is below as is the code for a version of Figure 1c and 1j. 


### START of script to explore data in Figure 1b, 1c and 1j
# the packages that we need
library(readxl)
library(dplyr)
library(ggplot2)
library(tidyr)
library(ggpubr)

# It's possible to explore the data  
# This is an Excel file for Figure 1 - each part a different worksheet
link <- "https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-019-1263-7/MediaObjects/41586_2019_1263_MOESM8_ESM.xlsx"

# the download.file() function downloads and saves the file with the name given
download.file(url=link, destfile="file.xlsx", mode="wb")

# check worksheet names 
sheet_names <- excel_sheets("file.xlsx")
# import the data for Figure 1b
data <- read_excel("file.xlsx", sheet_names[1])

# data in wide format...
# put into long tidy format... with gather() from dplyr
data %>% gather(na.rm = TRUE) %>%
# make patient samples factors - important for graphing and stats
 mutate(key = factor(key, c("Control", "Resolving", "RA"))) -> data_3

# First plot - a type of dot plot
ggplot(data_3, aes(key, value)) +
    geom_point()

# geom_boxplot shows the statistics - mean and quartiles
# geom_jitter() shows points off centre across the x-axis...
plot <- ggplot(data_3, aes(key, value)) +
    geom_boxplot(outlier.size=0) +
    geom_jitter(width = 0.1)
plot
# we want to put a p-value on the graph
# using the stat_compare_means() function from ggpubr package 
my_comparisons <- list( c("Control", "RA"))
plot + ylim(0,0.2) + 
    stat_compare_means(data = data_3, comparisons = my_comparisons)
# We want to replace the number with four stars as per the paper...
plot <- plot + ylim(0,0.2) + 
    stat_compare_means(data = data_3, comparisons = my_comparisons,
        symnum.args = list(cutpoints = c(0,0.0005, 1), 
            symbols = c("****","ns")))
# and add labels
plot + labs(x = "",
    y = "FAPalpha expression (pixels per unit area)", 
    title = "Figure 1b",
    subtitle = "Croft et al, Nature, 2019") +
    theme_classic() + theme(legend.position="none")
# not the exact same but very similar to the paper

# Data 1c is very similar in format.
# Here is a full script to show it...
data_1c <- read_excel("file.xlsx", sheet_names[2])
data_1c %>% gather(na.rm = TRUE) %>%
    mutate(key = factor(key, c("Control", "Resolving", "RA"))) %>%
    ggplot(aes(key, value, shape = key, colour=key)) +
    geom_boxplot(outlier.size=0) +
    geom_jitter(width = 0.1) +
    ylim(0,2) + 
    labs(x = "",
        y = "Fap mRNA expression", 
        title = "Figure 1c",
        subtitle = "Croft et al, Nature, 2019") +
    theme_classic() + theme(legend.position="none") +
    stat_compare_means(comparisons = my_comparisons,
        symnum.args = list(cutpoints = c(0,0.0005, 1), 
            symbols = c("****","ns")))


# Data 1j is another dot plot...
sheet_names[5]
data_1j <- read_excel("file.xlsx", sheet_names[5])
data_1j %>%
    gather(na.rm = TRUE) %>%
    mutate(key = factor(key, c("Control", "STIA"))) %>%
    ggplot(aes(key, value, shape = key, colour=key)) +
    geom_boxplot(outlier.size=0) +
    geom_jitter(width = 0.1) +
    ylim(0,0.25) + 
    labs(x = "",
        y = "Bioluminescence", 
        title = "Figure 1j",
        subtitle = "Croft et al, Nature, 2019") +
    theme_classic() + theme(legend.position="none") +
    stat_compare_means()


## END