Wednesday, 26 August 2015

Temperature time course

Tracking changes over time is a very useful way of understanding and analysing your system. This applies whether your system is biological or otherwise. As part of my hobby of baking bread, I have built a brick oven in my garden. I have installed thermocouples that allow me to measure the temperature at the top and the bottom of my oven. To investigate the performance of my oven, I have recorded temperatures when I fired the oven in May, June and August of this year.

The data for this is available on github and is downloaded as part of the script.

I have written a script to download, manipulate and graph the data. 

The manipulations required were:
  • turning my date and time into a date an time object that R could understand
  • subtracting the start time to calculate the elapsed time
  • the elapsed time was seconds which was converted to hours
  • the time was then converted to a number to allow for graphing by ggplot
  • the data was melted into a format for ggplot
Within the script needs to be applied separately to each of the three months of data. This can be done easily by changing the read.table() function to the appropriate file. 

By doing this, I graph and analyse the data in the same way making the graphs easier to compare. Here are the three graphs:


Here is the script for generating these graphs:

library(reshape2)
library(ggplot2)
library(RCurl) # allows us to download data through urls 

# URL for May data:
May <- getURL("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/ovenTempMay.tsv")

# URL for Jun data:
Jun <- getURL("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/ovenTempJun.tsv")

# URL for Aug data: 
Aug <- getURL("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/ovenTempAug.tsv")

# put the URLs together in a vector
urls <- c(May, Jun, Aug)
months <- c("May", "Jun", "Aug")

# do it in a loop to apply to each of the three data sets
for(i in 1:3){

# read in the data for the relevant month - it's a tab separated file 
    data <- read.table(text = urls[i], stringsAsFactors=FALSE, sep = "\t", header = TRUE)
    
    
# convert the data and time into a format that R will understand. make it into a POSIXct object. 
# so as not to mess with the original data put it in a new column called time.P 
# because my dates are separted by slashes "/", I need to tell R about the format. 
    data$time.P <- as.POSIXct(data$time, format = "%d/%m/%Y %H:%M")
    
# the first value is the start.time for this temperature profile. 
    start.time <- data$time.P[1]
    
# calculate the elapsed time 
# substract the start time from each value. 
# This returns the time in seconds
    data$e.time.sec <- data$time.P - start.time
# convert this into an hour time and change back into a number for graphing purposes. 
    data$e.time.hour.num <- as.numeric(data$e.time.sec/3600)
    
    
 # take out the data we need for the plot
    data.subset <- as.data.frame(data$e.time.hour.num)
    colnames(data.subset)[1] <- "e.time.hour.num"
    data.subset$top <- data$top
    data.subset$bot <- data$bot
    
# melt it into a format for ggplot using melt() function
   data.subset.melt <- melt(data.subset, 
                           id.vars = "e.time.hour.num")

# put in nice column names
  colnames(data.subset.melt) <- c("elapsed.time", "place", "temp")
  
# make the graph object
  p <- ggplot(data.subset.melt, aes(x=elapsed.time, 
                                    y= temp, 
                                    colour = factor(place, labels = c("Top", "Bottom")))) + 

# colour = factor and the labels allows us to customize the legend
        geom_line(size=1) +
        geom_point() +
        labs(color = "Place") + # customizes the legend title
        scale_colour_manual(values=c("black","red")) +
        ylab("Temperature") + # y-label
        xlab("Elapsed time (hours)") + # x-label
        ylim(0,400) +
        scale_x_continuous(limits=c(0, 48), 
                           breaks=c(0,1,4,8,12, 24, 48)) +
        theme_bw()

# position the legend  
    p <- p + theme(legend.position=c(1,1), # move to the top right
                   legend.justification=c(1,1), # move it in a bit
                   legend.text=element_text(size = 12), # increase size of text
                   legend.title=element_text(size = 12)) # and the title
    
    p <- p + theme(axis.title.y = element_text(size = 14 )) + 
      theme(axis.text = element_text(size = 12))
    
# add the appropriate title
    p <- p + ggtitle(paste0("Oven Temp (", months[i], " Firing)")) 

# print the object - you have to do this because of the loop
    print(p)
}

Helpful resources, I used to prepare this script:


Wednesday, 5 August 2015

Opening an image of a leukemia cell in R

Updated 16th May 2016: The image is now on Github so link points there.

I have been exploring the possibility of using R to open and analyse some images. This was first prompted by a delegate that attended our R for Biochemists Training Day organised through the Biochemical Society.

Our search revealed the Bioconductor package, EBImage which allows the opening of .jpg files using R. This allowed me to extract the fluorescence values that I had from this image which I prepared with a colleague, Dr Elisabeth Walsby. Beth stained the cell and I did the confocal microscopy. Here is the nice picture of a chronic lymphocytic leukaemia cell stained for a protein (red colour) and for DNA (blue).



I can open this image with R using the readImage() function that comes as part of the EBImage package. When I display the image using the display() command, it opens a new browser window. It has various functionality to explore the image shown here:



The most useful aspect of this approach is the ability to extract the fluorescence values to use to make nice plots in R. This uses the imageData() function which works like this:

  • imageData(img)[, 512, 3]  - access a vector with all the x values across one y value (pixel number 512) and extracts the third colour for the image. 
I've used EBImage to extract values and to generate this graph which shows the fluorescence of the pixels across the middle of the cell:


Here is the script with some plots that I generated along the way:

# EBImage - a Bioconductor package for analysing images
source("http://bioconductor.org/biocLite.R")
biocLite("EBImage")
library("EBImage")
library(ggplot2)

# the image is on Github
img <- readImage("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/images/cllCell.jpg")
# the object created is a Large Image (24 Mb in this case!)

display(img)
# opens a window in a web browser and allows you to look at the image

print(img)
# shows some of the structure of the Large Image object.

dim(img)
# [1] 1024 1024    3   
# corresponds to x and y pixels of the image and then the number of colours for each pixel

# why do we want to do this?
# so that we can extract data out of the image and draw nice graphs. 
# to extract a value out of the Large Image use the imageData() function

# if you place the mouse over the image in the web browser, you see the name of the pixel. 

# we can extract the fluorescence value and plot the data
x <- imageData(img)[, 512, 3]

# this gives the values for colour 3 for the whole x line at the value of y = 512. 
# this is a list of 1024 numbers which can be plotted
plot(x)  

# the index at the bottom refers to the x values across the whole image. 


# using the web browser, I can choose the line across the middle of the cell
# extract a row of 100 pixels wide from left to right of the middle of the image.
# I chose to extract values from y pixel 425 to 545 which correspond to the middle of the cell

# colour 1
x10.col1 <- as.data.frame(imageData(img)[ ,425:525, 1])
plot(x10.col1$V5)  # plot one of the values
Plot one of the values (V5)

x10.col1$mean <- rowMeans(x10.col1[1:100])
plot(x10.col1$mean) # plot the mean
Plot the mean of 100 pixels
# colour 3
x10.col3 <- as.data.frame(imageData(img)[ ,425:525, 3])
x10.col3$mean <- rowMeans(x10.col3[1:100])
plot(x10.col3$mean, col = "blue")
lines(x10.col1$mean, col = "red")
Plot both colours using Base R

# assemble a data frame to allow us to use ggplot to plot the data..
m <- as.data.frame(x10.col1$mean)
m$col3 <- x10.col3$mean

# change the column names in the data frame
colnames(m) <- c("col1", "col3")

# make the basic ggplot object
p <- ggplot(data=m, aes(x=seq(1, length(col3)))) + 
     geom_line(aes(y = col3), colour = "blue") + 
     geom_line(aes(y = col1), colour = "red")
p  # show the plot
Plot both colours in ggplot2

# focus on just the cell
p <- ggplot(data=m, aes(x=seq(1, length(col3)))) + 
     geom_line(aes(y = col3), colour = "blue") + 
     geom_line(aes(y = col1), colour = "red") + 
     xlim(350,850) # focus on just the cell
p  # have a look
# gives Warning messages but these can be ignored.
Focus on the cell by limiting the x-axis.

# add some titles and a easy theme 
p <- p +  xlab("Distance (pixels)") +   #label x-axis
          ylab("Fluorescence signal") +    #label y-axis
          ggtitle("Histogram of fluoresence of CLL cell") +  #title
          theme_bw()     # a simple theme

p   # show the plot
Add titles and change the theme.
# plot points instead of lines...
q <- ggplot(data=m, aes(x=seq(1, length(col3)))) + 
  geom_point(aes(y = col3), colour = "blue") + 
  geom_point(aes(y = col1), colour = "red") + 
  xlim(350,850) +  
  xlab("Distance (pixels)") +   # label x-axis
  ylab("Fluorescence signal") +    # label y-axis
  ggtitle("Histogram of fluoresence of CLL cell") +  # add a title
  theme_bw()     # a simple theme
q   # show the plot

You have the plot at the top of the page....


Resources: