Showing posts with label published data. Show all posts
Showing posts with label published data. Show all posts

Friday, 23 July 2021

Seems we'll all need more COVID vaccinations...

On Wednesday, July 20t0 2021, I saw this extended twitter feed by Dr Céline Gounder that reviewed the data and concepts underpinning whether we will need more COVID vaccines. It's a super long thread that is available here through Threadreader too. The thread felt really important and showed lots of data. One of the figures that interested me most is this one from a Lancet paper:

It is a lovely violin plot that shows a decrease in virus neutralisation by variant of the current corona virus (SARS‑CoV‑2). My understanding is that blood samples were taken from various people who had received various vaccines. 

I was surprised when I looked at the paper because there was no figures in it. However, the figures are all in the Supplementary Material which you have to download as a PDF to read. The Supplementary Material also had a heart warming phrase: "All data (anonymised) and full R code to produce all figures and statistical analysis presented in this manuscript are freely-available online on Github: https://github.com/davidlvb/CrickUCLH-Legacy-VOCs-2021-05"

I was very happy to see the code and data available on Github and I forked the data to see if I could reproduce this violin plot. The short answer was yes. There were some minor challenges. I'm missing the text is at the top of the figure. However, I was able to reproduce the main body of the figure within about 20 minutes of engaging with the process. I was pleased.

Here is the graph I made which is a good reproduction of the one in the Twitter feed and Figure 1B in the supplementary data. Below is some simplified code extracted from the original code that is required to make the Figure 1B.

START
# making just Figure 1B - a violin plot

library(sp)
library(tidyverse)
library(khroma)

# download the data from my fork on Github
github_link <- "https://github.com/brennanpincardiff/Crick-UCLH-Legacy-VOCs-2021-05/blob/main/Crick_Legacy_2021-24-05_B-1-617-2_PUBLIC.Rda?raw=true"
load(url(github_link))

### from Lines 21 to 261 of original script

### Subset data for further analysis
studyData <- dtHashed %>% 
    filter(COVID_vaccStatus %in% c(1,2)) # Ignore individuals who are in seroconversion window following dose 1 and dose 2 of vaccine

### Set constants for various functions / plots
strainOrder <- c("Wuhan1", "D614G", "Kent", "SAfrica", "India2")
referenceIC50 <- 2^9.802672

dose2cohort <- studyData %>% filter(COVID_vaccStatus == 2,  sampleOrderInVaccStatus == 1)

########################################################################
#   Panel 1. Vaccine responses per strain following 2nd dose Pfizer    #
########################################################################

relevantData <- dose2cohort %>%
    pivot_longer(cols = ends_with("ic50"), names_to = "strain", values_to = "ic50")
relevantData$strain <- str_replace_all(relevantData$strain, pattern = "_ic50", replacement = "")
relevantData$strain <- fct_relevel(relevantData$strain, strainOrder)


outplot <- ggplot(relevantData, aes(x=strain, y=ic50, color = strain, label = sample_barcode)) + 
    scale_colour_muted() +
    geom_hline(yintercept = referenceIC50,linetype = 3 ) +
    geom_violin(trim=TRUE) + 
    scale_y_continuous(trans='log2', 
                       breaks=c(5, 10, 64, 256, 1024, 5120), 
                       labels=c("[0]", "[<40]", "64", "256", "1024", "[>2560]"),
                       minor_breaks=c(32, 128, 512, 2048)) +
    ylab(bquote('Virus Neutralisation, '~IC[50]~ '')) +
    geom_jitter(shape=20, position=position_jitter(0.2), alpha=0.3) + 
    stat_summary(fun=median, geom = "point", color="black",  shape=5, size=1, stroke=1) + 
    theme_bw(base_family = "Helvetica Neue Thin") +
    theme(legend.position="none")+
    theme(axis.text.x = element_text(angle = 45, hjust = 1, size=12)) + 
    theme(axis.text.y = element_text(size=12))  + 
    theme(
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.title.y=element_text(size=15),
        axis.title.x=element_blank()
    )

outplot
## END of this SCRIPT


Useful resources

Tuesday, 27 August 2019

Exploring cytokine data from fibroblasts with ggpubr...

EDITED 24 July 2020. Some changes to functions means that this script needed to be changed. It should work again now.

===
Last month, I wrote a blog post exploring some data from an interesting publication by Croft et al about the role of synovial fibroblasts in rheumatoid arthritis. The paper was "Distinct fibroblast subsets drive inflammation and damage in arthritis". A Biorxiv of the manuscript is available.

Today, while preparing teaching materials for R for Biochemists 201, an advanced R course I'm preparing for the Biochemical Society, I have been exploring the data in Figure 4a. The data analysis workflow requires data import, some tidying in Excel and R, summarisation and visualisation. The data in the Excel file was challenging. I have edited it to make it a little easier for me and put it on Github

For the visualisation, I have been using the ggpubr package by Alboukadel Kassambara. ggpubr is a very interesting package that makes adding results from statistical tests to ggplots more straight forward. Alboukadel Kassambara has also created a VERY useful site for learning R.

To go back to the Nature paper by Croft et al and quote from the paper:
"FAPα+THY1− and FAPα+THY1+ cells isolated from day 9 STIA synovia were stimulated and analysed using luminex."

Below is my version of Figure 4a which shows the different cytokine profiles produced. It's not perfect but it's helped my learning a lot today.


Here is the script that makes my version of Figure 4a and some graphs along the way. The data exploration revealed a few issues about the data which I have mentioned at the bottom and along the way. I will email the corresponding and first authors with some questions. This has been a very interesting process. I would like to thank the authors for making the data available and Alboukadel Kassambara for the interesting ggpubr package. 

START
# making figure 4a ...
library(readxl)
library(dplyr)
library(ggplot2)
library(tidyr)
library(tibble)
library(ggpubr)

# DATA IMPORT
# read in the data - I have simplified it a bit...
link <- ("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/figure_4a_data_altered.xlsx")
download.file(url=link, destfile="file.xlsx", mode="wb")

data_t <- read_excel("file.xlsx")

# DATA TIDY
# transforming (rotate x and y) the data makes it easier for me...
data_tt <- t(data_t[2:13])
# turn it into a tibble
data_ttt <- as_tibble(data_tt)
# add colunm names - the names for each cytokine
colnames(data_ttt) <- toupper(data_t$Cytokine)

# use gather() function from dplyr package to change to long format 
data_g <- gather(data_ttt[1:12], key = "cytokine")

# need to add thy status - down the whole column
thy <- c("minus", "minus",  "minus",  "minus",  
         "minus", "minus", "plus", "plus", "plus", "plus",
         "plus", "plus")
thy_144 <- c(thy, thy, thy, thy, thy, thy,
             thy, thy, thy, thy, thy, thy)

data_g <- add_column(data_g, thy = thy_144)
# then make it a factor
data_g$thy <-as.factor(data_g$thy)

# order of cytokines from paper
cytokine_order <- c("CXCL14", "CXCL5", "CXCL2", "CXCL12", 
    "CCL2", "CCL7", "CCL5", "CCL12",
    "IL6", "MMP3", "MMP9", "MMP13")
# make these a factor with an order too...
# note: error in original data on the Nature file - "Cxc12" NOT "Cxcl12"
data_g$cytokine <- factor(data_g$cytokine, levels = cytokine_order)

# DATA SUMMARISE
# calculate means and create a new file with the means
data_g %>%
    group_by(cytokine, thy) %>%
    summarise(mean =mean(value)) -> cyto_means

# FIRST VISUALISATION of the means
ggplot(cyto_means, aes(x = thy, 
    y = mean)) + 
    geom_col() + 
    facet_wrap(~cytokine, scales ="free_y")


# MORE COMPLEX VISUALISATION
# let's try adding the points and then the stat tests...
# we go back to the original data 
# without the summarise step
# use stat_summary() to calculate mean and plot the bars...
# and geom_jitter()
ggplot(data_g, aes(x=thy, y=value, fill = thy)) +
    stat_summary(fun="mean", geom="bar") +   # calculates means
    scale_fill_manual(values=c("blue", "red")) + 
    facet_wrap(~cytokine, scales ="free_y") +
    geom_jitter() 


# then the stats tests with stat_compare_means()
my_comparisons <- list( c("minus", "plus"))
ggplot(data_g, aes(x=thy, y=value, fill = thy)) +
    stat_summary(fun="mean", geom="bar") +
    scale_fill_manual(values=c("blue", "red")) + 
    facet_wrap(~cytokine, scales ="free_y") +
    geom_jitter() +
    stat_compare_means(comparisons = my_comparisons)
# default Wilcoxon test - non parametric
# so values are different


# change method to t.test and paired - true
ggplot(data_g, aes(x=thy, y=value, fill = thy)) +
    stat_summary(fun="mean", geom="bar") +
    scale_fill_manual(values=c("blue", "red")) + 
    facet_wrap(~cytokine, scales ="free_y") +
    geom_jitter() +
    stat_compare_means(comparisons = my_comparisons, method = "t.test",
        paired = TRUE, size = 2)
# these don't look the same as the values in the paper
# the paper says paired T-test but perhaps I don't have the pairings correct...
# I don't think there is a way to do this without more information
# from the authors... 


# can add some stars for significance but as the values aren't the same
# it won't be quite the same as the paper... 
# create an object, called plot, with the graph 
ggplot(data_g, aes(x=thy, y=value, fill = thy)) +
    stat_summary(fun="mean", geom="bar") +
    scale_fill_manual(values=c("blue", "red")) + 
    facet_wrap(~cytokine, scales ="free_y") +
    geom_jitter() +
    stat_compare_means(comparisons = my_comparisons, method = "t.test",
        paired = TRUE, size = 2,
        symnum.args = list(cutpoints = c(0,0.0001,0.005,0.001, 1), 
            symbols = c("****","***", "***", "ns"))) +
    theme(strip.background = element_rect(fill = FALSE)) -> plot

# show the object
plot


# add titles and reformat them a bit
plot + 
    labs(x = "Thy Status",
        y = "Cytokine levels", 
        title = "Figure 4a",
        subtitle = "Croft et al, Nature, 2019") +
    theme(panel.background = element_rect(fill = "white"))



# change the formatting and move legend
plot + 
    labs(x = "Thy Status",
        y = "Cytokine levels", 
        title = "Figure 4a",
        subtitle = "Croft et al, Nature, 2019") +
    theme_classic() +
    theme(strip.background = element_rect(fill = "white", colour = "white")) +
    theme(legend.position="bottom")


# Not perfect but it will do for today...
END

So unlike the published version, I don't have the error bars in the plots. I can't find an easy way to do this with ggpubr. Perhaps I would need to create each graph separately. I also don't have the units described which is a weakness. They vary for the different cytokines. Again making individual graphs is probably necessary.

Playing with the data revealed some issues about the data:
  1. Most of the T test values were as reported but some were not. For example, the published p value for CXCL12 was P = 0.0002 whereas I calculated it as 0.001. 
  2. A visual comparison of the IL6 data suggests that the published data is not the same as the data used to create the figure. The published data had three values between 200ng/ml and 400ng/ml. None of the values in the data file were this low. 
  3. There was a spelling error in the data file with CXCL12 spelled as "Cxc12"



Monday, 1 July 2019

Exploring data from Nature journal about fibroblast subsets in arthritis...

Recently, I attended a conference entitled Data-driven Systems Medicine at Cardiff University. An interesting meeting organised by Dr Barbara Szomolay & Dr Tom Connor who work in the Systems Immunity Institute.

Prof. Mark Coles, Professor at Kennedy Institute of Rheumatology at Oxford University talked about a recent paper published in Nature. The title of the paper was "Distinct fibroblast subsets drive inflammation and damage in arthritis" by Croft et al . A Biorxiv of the manuscript is also available. Nature requires authors to make a "Data availability statement" which encourages authors to share their data.

This gave me the opportunity to explore some of the data in R. It was relatively straight forward workflow: import, tidy and visualise. To make a figure exactly like the paper takes a bit more effort. 

To quote from the paper:
"Rheumatoid arthritis (RA) is a prototypic IMID6 in which synovial fibroblasts contribute to both joint damage and inflammation. We found that expression of FAPα, a cell-membrane dipeptidyl peptidase10, was significantly higher in both synovial tissue and cultured synovial fibroblasts isolated from patients who fulfilled classification criteria for RA compared to patients in whom joint inflammation resolved (Fig. 1a–c), suggesting that FAPα expression may associate with a pathogenic fibroblast phenotype."

Here is a version of Figure 1b which shows FAPalpha in patient samples with different types of disease. It a quantification of staining of cells in synovial tissue.

The code to make this is below as is the code for a version of Figure 1c and 1j. 


### START of script to explore data in Figure 1b, 1c and 1j
# the packages that we need
library(readxl)
library(dplyr)
library(ggplot2)
library(tidyr)
library(ggpubr)

# It's possible to explore the data  
# This is an Excel file for Figure 1 - each part a different worksheet
link <- "https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-019-1263-7/MediaObjects/41586_2019_1263_MOESM8_ESM.xlsx"

# the download.file() function downloads and saves the file with the name given
download.file(url=link, destfile="file.xlsx", mode="wb")

# check worksheet names 
sheet_names <- excel_sheets("file.xlsx")
# import the data for Figure 1b
data <- read_excel("file.xlsx", sheet_names[1])

# data in wide format...
# put into long tidy format... with gather() from dplyr
data %>% gather(na.rm = TRUE) %>%
# make patient samples factors - important for graphing and stats
 mutate(key = factor(key, c("Control", "Resolving", "RA"))) -> data_3

# First plot - a type of dot plot
ggplot(data_3, aes(key, value)) +
    geom_point()

# geom_boxplot shows the statistics - mean and quartiles
# geom_jitter() shows points off centre across the x-axis...
plot <- ggplot(data_3, aes(key, value)) +
    geom_boxplot(outlier.size=0) +
    geom_jitter(width = 0.1)
plot
# we want to put a p-value on the graph
# using the stat_compare_means() function from ggpubr package 
my_comparisons <- list( c("Control", "RA"))
plot + ylim(0,0.2) + 
    stat_compare_means(data = data_3, comparisons = my_comparisons)
# We want to replace the number with four stars as per the paper...
plot <- plot + ylim(0,0.2) + 
    stat_compare_means(data = data_3, comparisons = my_comparisons,
        symnum.args = list(cutpoints = c(0,0.0005, 1), 
            symbols = c("****","ns")))
# and add labels
plot + labs(x = "",
    y = "FAPalpha expression (pixels per unit area)", 
    title = "Figure 1b",
    subtitle = "Croft et al, Nature, 2019") +
    theme_classic() + theme(legend.position="none")
# not the exact same but very similar to the paper

# Data 1c is very similar in format.
# Here is a full script to show it...
data_1c <- read_excel("file.xlsx", sheet_names[2])
data_1c %>% gather(na.rm = TRUE) %>%
    mutate(key = factor(key, c("Control", "Resolving", "RA"))) %>%
    ggplot(aes(key, value, shape = key, colour=key)) +
    geom_boxplot(outlier.size=0) +
    geom_jitter(width = 0.1) +
    ylim(0,2) + 
    labs(x = "",
        y = "Fap mRNA expression", 
        title = "Figure 1c",
        subtitle = "Croft et al, Nature, 2019") +
    theme_classic() + theme(legend.position="none") +
    stat_compare_means(comparisons = my_comparisons,
        symnum.args = list(cutpoints = c(0,0.0005, 1), 
            symbols = c("****","ns")))


# Data 1j is another dot plot...
sheet_names[5]
data_1j <- read_excel("file.xlsx", sheet_names[5])
data_1j %>%
    gather(na.rm = TRUE) %>%
    mutate(key = factor(key, c("Control", "STIA"))) %>%
    ggplot(aes(key, value, shape = key, colour=key)) +
    geom_boxplot(outlier.size=0) +
    geom_jitter(width = 0.1) +
    ylim(0,0.25) + 
    labs(x = "",
        y = "Bioluminescence", 
        title = "Figure 1j",
        subtitle = "Croft et al, Nature, 2019") +
    theme_classic() + theme(legend.position="none") +
    stat_compare_means()


## END











Tuesday, 16 April 2019

An icon plot inspired by bacon and a new book....

Last Saturday, "The Art of Statistics - Learning from Data" by Professor Sir David Spiegelhalter arrived. I ordered it after watching this video entitled "Why statistics should make you suspicious". It is a very interesting book which discusses statistics, data visualisation, data science and the challenges of teaching statistics.

I like to see if I can reproduce figures from papers and books, so I spent most of yesterday trying to reproduce Figure 1.4 with R. I have been partially successful. I am pretty sure I could have made it more quickly with Powerpoint, but then I wouldn't have learned anything about R. Here is the image from the book (I hope it is OK to reproduce this... I will contact to ask permission):


It's an icon plot or an icon array. They can be used to communicate risk.

Here is my attempt to reproduce the icon array using the ggimage and  ggwaffle packages:





It's not perfect but it's the best I can do at the moment. I'm not happy with the separation of the rows of icons. The final images are very large and cause R-Studio some problems in terms of speed of rendering. However, I've learned a lot.

A much quicker and easier method uses the personograph package. The types of icon are very restricted - only male icons :-( Also the random distribution of black icons of the original image is not possible. This is supposed to show the random nature of disease which I quite like. The images are very quick to render and the code is easy to understand.

Here are the images made with personograph:
Figure 1.4
Bacon sandwich example using a pair of icon arrays. Of 100 people who do not eat 
bacon, 6 (black icons) develop bowel cancer in the normal run of events (top panel).
Of 100 people who eat bacon every day of their lives, there is 1 additional (red) case.
I've also worked with the waffle package as shown in the code below.


Here is all the R code:

## START
# installing the packages, first remove the hash tag to run:

# install.packages("personograph") # easiest way to start

# https://github.com/liamgilbey/ggwaffle
# devtools::install_github("liamgilbey/ggwaffle")

# install.packages("waffle", "readr", "ggpubr")

# https://github.com/GuangchuangYu/ggimage
# setRepositories(ind=1:2)
# install.packages("ggimage")


# first choice of package - nice and easy but limited customisation
library(personograph)
# https://github.com/joelkuiper/personograph
# https://cran.r-project.org/web/packages/personograph/index.html

# the data is supplied in a list and is plotted in order. 
data <- list(first=0.06, second=0.94)
personograph(data,  colors=list(first="black", second="#efefef"),
    fig.title = "100 people who do not eat bacon",
    draw.legend = FALSE, dimensions=c(5,20))


data_2 <- list(first=0.06, second=0.01, third=0.93)
personograph(data_2, colors=list(first="black", second="red", third="#efefef"),
    fig.title = "100 people who eat bacon every day",
    draw.legend = FALSE,  dimensions=c(5,20))



# icons all male... no random distribution...

# second package: waffle
library(waffle)
dont_eat_bacon <- c('Cancer' = 6, 'No cancer' = 94)
waffle(dont_eat_bacon, rows = 5, colors = c("#000000", "#efefef"),
    legend_pos = "bottom", title = "100 people who do not eat bacon")


eat_bacon <- c('Cancer' = 6, 'Extra case' = 1, 'No cancer' = 93)
waffle(eat_bacon, rows = 5, colors = c("#000000", "#f90000","#efefef"),
    legend_pos = "bottom", title = "100 people who eat bacon every day")




# nice clear colours but difficult to change symbols
# without installing fonts into system...
# I would rather not have to install fonts...

# Try a third package - ggwaffle

library(readr)
library(ggwaffle)
library(ggpubr)
library(ggimage)
theme_set(theme_pubr())

# in this case, we basically encode every point as a graph. 
# download the data
link <- ("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/ggwaffledata_mf.csv")
bacon_waf <- read_csv(link)

# have a look at data
View(bacon_waf)

# basic plot with geom_waffle() from ggwaffle package
ggplot(bacon_waf, aes(x, y, fill = bacon)) + 
    geom_waffle()

# add icons with geom_icon() from ggimage package
p1 <- ggplot(bacon_waf, aes(x, y, colour = no_bacon)) + 
    geom_icon(aes(image=icon), size = 0.1) +
    scale_color_manual(values=c("black", "grey")) +
    theme_waffle()  +
    theme(legend.position = "none") +
    labs(x = "", y = "",
        title = "100 people who do not eat bacon")

# show the plot
# this is VERY SLOW to draw...
# because it contains 100 icons each of which gets
# downloaded from the internet. 
# I feel sure there is a better way but I don't know it at the moment...
p1


p2 <- ggplot(bacon_waf, aes(x, y, colour = bacon)) + 
    geom_icon(aes(image=icon), size = 0.1) +
    scale_color_manual(values=c("black","red", "grey")) +
    theme_waffle()  +
    theme(legend.position = "none") +
    labs(x = "", y = "",
        title = "100 people who eat bacon every day")

# this is the text at the bottom of the page
text <- paste("Figure 1.4\n",
"Bacon sandwich example using a pair of icon arrays, with randomly",
"scattered icons showing the incremental risk of eating bacon every",
"day. Of 100 people who do not eat bacon, 6 (solid icons) develop",
"bowel cancer in the normal run of events. Of 100 people who eat",
"bacon every day of their lives, there is 1 additional (red) case.",
sep = " ")

# format the text as ggplot object
# with ggparagraph() from the ggpubr package
text_p <- ggparagraph(text = text, size = 12, color = "black")

# arrange the images and text with ggarrange() from the ggpubr package
together <- ggarrange(p1, p2, text_p, 
    ncol = 1, nrow = 3,
    heights = c(1, 1, 0.3))

together
# AGAIN VERY SLOW...!!

# ggsave("together_3.pdf", together)
# works but these are very large PDF file and
# take a lot of time to render

# at the end it seems good to clear memory...

gc()

## END

Some resources:
  • Reproducing immunization data from Factfulness - another good book from Hans Rosling.  
  • Spiegelhalter, DJ (2008) "Understanding Uncertainty" Ann Fam Med 6:196-197 doi: 10.1370/afm.848
  • Galesic, M & Garcia-Retamero, R (2009) "Using Icon Arrays to Communicate Medical Risks: Overcoming Low NumeracyHealth Psychology  2009, Vol. 28, No. 2, 210 –216  
  • "Infographic-style charts using the R waffle package" by N Saunders 


  • Monday, 14 January 2019

    Making a box and whisker plot with some published proteomic data...

    Updated: 1st July 2019 - the source file has changed so some of the script had to be changed.
    I'm preparing some teaching materials for another Biochemical Society R training event with the draft title of R for Biochemists 201. Some more advanced material based on feedback for participants of R for Biochemists 101.
    In preparation, I've been looking at published proteomics data. I've come across a nice paper by a group in the Barts Cancer Institute in London. The paper is entitled "Proteomic and genomic integration identifies kinase and differentiation determinants of kinase inhibitor sensitivity in leukemia cells". It was published in the journal Leukaemia.
    I visited their lab once many years ago and I have heard the senior author, Pedro Cutillas, talk.  It is very interesting work, I think.
    They have made their data available so I've spend some time writing a script that makes one part of Figure 1a - a nice box and whisker plot.

    Here is the plot:



    and here is the script:
    START
    ## data import
    library(readxl)
    library(ggplot2)
    link <- "https://static-content.springer.com/esm/art%3A10.1038%2Fs41375-018-0032-1/MediaObjects/41375_2018_32_MOESM2_ESM.xlsx"
    download.file(link, "temp_data")
    data <- read_excel("temp_data", skip=2) 
    # skip = 2 stops the first two rows being part of the file
    # the next row is used as titles of the columns
    # skip needs to be determined by looking at the data

    # remove bottom two rows as only 36 patients
    data <- data[1:36,]


    # using the geom_boxplot() function, we can draw our graph
    ggplot(data = data,
        aes(FAB, log10(`MEKi (trametinib)...13`), colour = FAB)) +
        geom_boxplot(na.rm = TRUE)

    # we can make it look a bit more like the plot in the paper using geom_jitter()
    plot <- ggplot(data = data,
        aes(FAB, log10(`MEKi (trametinib)...13`), colour = FAB)) +
        geom_boxplot(na.rm = TRUE) +
        geom_jitter(width=0.15, na.rm = TRUE) +
        theme_bw() +
        labs( y = "Log10(EC50)nM",
            title = "Sensitivity of AML patients samples to MEK inhibitor", 
            subtitle = "Casado et al (2018) Leukaemia 32:1818–1822
    doi:10.1038/s41375-018-0032-1") +
        theme(legend.position="none")
    plot

    # to print a high resolution of this the tiff() function can be used. 
    tiff("plot.tiff", height = 12, width = 17, units = 'cm', compression = "lzw", res = 300)
    plot
    dev.off()
    END


    Some resources: