A site to help Biochemists learn R.

Starting points

Thursday, 18 June 2015

Drawing a proteomic data volcano plot....

I really like this data produced by this study from Liverpool (Eagle et al (2015) Mol Cell Proteomics, 14, 933-945). It a proteomic study of two types of leukaemic cell. I have used it already to compare their protein list to some of our data. Today, I have used it to draw a volcano plot which shows the change in protein expression and the significance of the change (p value). These graphs are popular in genomic and proteomic studies. 

Here is the graph, drawn with ggplot:

You will need to download the data to use it. This can be done from here. Then I went into Excel and saved it as a csv file. This script imports the file and produces the graph. 


setwd("/Users/... ")  # point this to where the file is...

data<-read.csv("mcp.M114.044479.csv", header=TRUE)

##Identify the genes that have a p-value < 0.05
data$threshold = as.factor(data$P.Value < 0.05)

##Construct the plot object
g <- ggplot(data=data, 
            aes(x=Log2.Fold.Change, y =-log10(P.Value), 
            colour=threshold)) +
  geom_point(alpha=0.4, size=1.75) +
  xlim(c(-6, 6)) +
  xlab("log2 fold change") + ylab("-log10 p-value") +
  theme_bw() +

# The script gives a warning message: Removed 1 rows containing missing values (geom_point).

# but it still works....


  1. Hello, I am trying to test your code in proteomic data in which I have a first column named Protein Accession.
    I would really thank you If you could tell me how to label the dots in accordance to Protein Accession (even if possible with No overlaping, but that is secondary).

    Thank you very much in advice,

    By the way, the code works perfectly!

    Julia Bauz√°

    1. Sorry Julia,
      I missed your comment in my email overload in November. Do you still want help with this?
      Best wishes,


Comments and suggestions are welcome.