Showing posts with label chromatography. Show all posts
Showing posts with label chromatography. Show all posts

Monday, 13 November 2017

Plotting some real chromatography data

Diane Hatziioanou who's working through R for Biochemists 101 from the Biochemical Society shared some of her chromatography data with me. 
Diane has published a similar plot in the journal, Anaerobe

Here is the plot made in ggplot2:


The plot is made without the data reformatting that I did in the previous example. Each line is added separately, scaled and text added and positioned manually. The is no legend. 

The curves are smoother because there is lots more data points. 

The secondary axis on the right hand side of the plot is in response to a question from Diane and is created using this code:

p +   scale_y_continuous(sec.axis = sec_axis(~./4, 
    name="Conductivity (mS/cm)"))

The tilda "~./4" part of this code modifies the numbers on the secondary axis. In this case, it divides them by four. We also supply a name for the axis. 

The data is downloaded from Github and is supplied with Diane's permission - thanks Diane. 

Here is the script that makes it and the intermediate plots:

START
library(readxl)
library(ggplot2)

# Hatziioanou, D, Mayer MJ, Duncan, SH, Flint, HJ & Arjan Narbad, A 
# (2013) Anaerobe 23:5-8
# http://dx.doi.org/10.1016/j.anaerobe.2013.07.006

link <- "https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/20100212roseburiab.xls"
download.file(url=link, destfile="file.xls", mode="wb")

data_2 <- read_excel("file.xls", skip = 2)

colnames(data_2) <- c("vol_ml", "UV_mAU", "Cond_ml_11_1", "conduct_mS_cm",
  "Cond_11_%_ml", "percent_%", "Conc_11_ml", "percentB_1", "percentB_2",
  "Pressure_ml", "Pressure_ MPa", "Flow_ml", "Flow_ml_min", "Temp_ml", 
  "Temp_C")

# published data is column 2 vs column 7
ggplot(data_2, aes(x = Conc_11_ml, y = UV_mAU)) + geom_line()
# Warning message - missing values

# row 602 gives a volume of almost 20 mls so limit the plot to that
data_2[602,7]
# 19.993

# subset the data - just 602 rows
ggplot(data_2[1:602,], aes(x = Conc_11_ml, y = UV_mAU)) + geom_line()
# output looks good 
# very much like in the paper. 


# manual overlay and text in lots of layers to create a nice graph
# make a canvas and set the x axis
p <- ggplot(data_2[1:602,], aes(x = Conc_11_ml)) 

# add our first line with y = UV which is a measure of protein 
p <- p + geom_line(aes(y = UV_mAU)) 
p


# add our second line with %B - this is a chromatography concept
# we change the scale of the line by multiplying by 10
# define colour as red
p <- p + geom_line(aes(y = percentB_2*10), colour = "red") 
p



# add conductivity as our third line in blue
# we change the scale of the line by multiplying by 4
# define colour as blue
p <- p + geom_line(aes(y = conduct_mS_cm*4), colour = "blue")
p

# add coloured text to explain the lines
# x and y define the places the text arrive and label gives the text
p <- p +  geom_text(mapping = aes(x = 2, y = 400, label = "UV (mAU)"))
p <- p + geom_text(mapping = aes(x = 4, y = 50, label = "0% B"), colour = "red")
p <- p + geom_text(mapping = aes(x = 17.5, y = 1050, label = "100% B"), colour = "red")
p <- p + geom_text(mapping = aes(x = 7, y = 950, label = "Conductivity (mS/cm)"), colour = "blue")
p

# this gives the secondary axis on the right hand side
# the function sec_axis() with a correction factor to change the number
# in this case the correction factor is "/4" meaning divide by four
# we multiplied by four when we plotted the blue line. 
# we also give the scale a name - "Conductivity" and units. 
p <- p +   scale_y_continuous(sec.axis = sec_axis(~./4, 
    name="Conductivity (mS/cm)"))
p


# add a theme to change the background
p <- p + theme_bw()
p

# add labels, titles with a source for the data...
p <- p + labs(x = "Volume (ml)",
    y = "[Protein] (UV mAU)",
    title = "Chromatogram of FPLC purification",
    subtitle = "Source: Hatziioanou et al, (2013) Anaerobe 23:5e8")
p
END



Some Resources:




Monday, 6 November 2017

Making a chromatography plot with an axis on the right hand side...

While working through our online course, R for Biochemists 101 from the Biochemical Society, Diane Hatziioanou made the following comment:
Could you point me in the direction of help to make a plot with different axes? For example for a character series “fractions” make one plot with two sets of data, one plotted against the right and one against the left y axes where the two y axes could be something like absorbance and activity. Thanks!

This inspired me to create this visualisation:

If you more information, leave a comment below.

Here is the script:
START
#https://stackoverflow.com/questions/43942002/geom-line-with-different-y-axis-scale


library(magrittr)
library(tidyr)

sample_data <- data.frame(fracs = rep(1:20),
  Absorbance = c(0.1, 0.1, 0.1, 0.2, 0.4, 0.1, 0.1, 0.5, 1.0, 0.1,
    0.1, 0.1, 0.5, 0.1, 0.1, 0.2, 0.3, 0.5, 1.0, 0.1),
  Activity = c(32, 33, 35, 150, 287, 39, 35, 40, 42, 32,
    30, 31, 32, 45, 57, 32, 33, 40, 41, 40))

# reformat the data into long format to allow plotting
data <- sample_data %>% gather(y, value, Absorbance:Activity)
data$fracs <- as.integer(data$fracs)

# trial plot for Activity (enzyme activity)
ggplot(data, aes(fracs, value, colour=y)) + 
  geom_line(data=data %>% filter(y=="Activity")) 




# trial plot for Absorbance
ggplot(data, aes(fracs, value, colour=y)) + 
  geom_line(data=data %>% filter(y=="Absorbance"))



# plot the two lines together
ggplot(data, aes(fracs, value, colour=y)) + 
  geom_line(data=data %>% filter(y=="Absorbance")) + 
  geom_line(data=data %>% filter(y=="Activity"))
# but different values makes Absorbance line flat. 





# change the scale for second line
max(sample_data$Absorbance)
max(sample_data$Activity)
scaleby = max(sample_data$Activity)/max(sample_data$Absorbance)

ggplot(data, aes(colour=y)) + 
  geom_line(data=data %>% filter(y=="Activity"),
    aes(x = fracs, y = value)) + 
  geom_line(data=data %>% filter(y=="Absorbance"),
    aes(x = fracs, y=value*scaleby))
# this looks better...


# putting it together with labels etc... 
p <- ggplot() + 
  geom_line(data=data %>% filter(y=="Activity"),
    aes(x = fracs, y = value, colour=y)) + 
  geom_line(data=data %>% filter(y=="Absorbance"),
    aes(x = fracs, y=value*scaleby, colour=y))

# limit for y axis
p <- p + ylim(0,300)

# adding our second scale on the right hand side of the plot
p <- p + scale_y_continuous(sec.axis = sec_axis(~./scaleby, 
    name="Absorbance (570nM)"))
  
# label the x and y axis...
p <- p + ylab("Enzyme activity (units)")
p <- p + xlab("Chromatography fractions")


# This is salt chromatography so adding a line showing 
# the NaCl concentration. 
salt_conc <- data.frame(fracs = rep(1:20),
  conc = c(c(10), seq(from = 10, to = 250, by= 13)))

# salt curve labels
salt_label <- data.frame(x = c(1,20), 
  y = c(10, 250), 
  text = c("[10nM]\nNaCl", "[250nM]\nNaCl"))

# check out the line
ggplot(salt_conc, aes(fracs, conc)) + 
  geom_line() + 
  geom_text(data = salt_label,
    aes(x = x,
      y = y,
      label = text))



# linetype=2 gives a dashed line
p <- p + geom_line(data = salt_conc, aes(fracs, conc),
  linetype = 2)
# add some labels to show the starting and ending concentrations
p <- p + geom_text(data = salt_label,
              aes(x = x,
                y = y,
                label = text),
                size = 2.5)

p <- p + theme_bw()
p <- p + ggtitle("Chromatography plot")
p <- p + theme(legend.position="top", legend.direction="horizontal")
p <- p + theme(legend.title=element_blank())
p





END