A site to help Biochemists learn R.

Starting points

Wednesday, 17 June 2015

Using ggplot to draw the LD50 graph

UPDATE: As of ggplot 2.0.0, released in Dec 2015, to use the geom_smooth() ggplot function, there is a need to put the method arguments (method.args = list()) into a list as detailed below.
As of 19th of Jan 2016, it means that the scripts here that use the geom_smooth() don't work. I will repair them all but it'll take a few days. Thanks to the commentors for sorting this out.


As mentioned previously, we do a lot of drug testing in our laboratory.
Here is the same data plotted with ggplot.
The experimental set up, done by a student in the lab, is as follows:
  • using some cells and add various concentrations of a novel drug
  • leave for 48 hours
  • then measure how many cells are dead
  • the experiment was done four times with the doses improved each time
Here is the graph and calculated LD50 - a measure of how good the drug is:



Here is the script:

# START of SCRIPT
library(ggplot2)

###  this is the data   ### 
# data from four experiments
conc <- c(5.00E-07, 1.00E-06, 1.00E-05, 
          5.00E-07, 1.00E-06, 5.00E-06, 1.00E-05, 2.00E-05, 
          5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05, 
          5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05)
dead.cells <- c(34.6, 47.7, 81.7, 
                37.6, 55.7, 89.1, 84.3, 85.2, 
                34.4, 46.1, 76.2, 84.3, 84.1, 
                24.5, 26.1, 60.6, 82.7, 87)

# transform the data to make it postive and put into a data frame for fitting 
data <- as.data.frame(conc)   # create the data frame
data$dead.cells <- dead.cells
data$nM <- data$conc * 1000000000
data$log.nM <- log10(data$nM) 

###  fit the data  ###
# make sure logconc remains positive, otherwise multiply to keep positive values
# (such as in this example where the iconc is multiplied by 1000

fit <- nls(dead.cells ~ bot+(top-bot)/(1+(log.nM/LD50)^slope),
           data = data,
           start=list(bot=20, top=95, LD50=3, slope=-12))
m <- coef(fit)
val <- format((10^m[3]),dig=0)

###  ggplot the results  ###
p <- ggplot(data=data,          # specify the data frame with data
            aes(x=nM, y=dead.cells)) +   # specify x and y
  geom_point() +          # make a scatter plot
  scale_x_log10(breaks = c(500, 1000, 2500, 5000, 10000, 20000))+
  xlab("Drug concentration (nM)") +   # label x-axis
  ylab("Dead cells (% of cells)") +    # label y-axis
  ggtitle("Drug Dose Response and LD50") +  # add a title
  theme_bw() +      # a simple theme
  expand_limits(y=c(20,100))   # customise the y-axis

# Add the line to graph using methods.args (New: Jan 2016)
p <- p +  geom_smooth(method = "nls",
                      method.args = list(formula = y ~ bot+(top-bot)/(1+( x / LD50)^slope), 
                                    start=list(bot=20, top=95, LD50=3, slope=-12)),
                      se = FALSE)

# Add the text with the LD50 to the graph. 
p <- p+ annotate(geom="text", x=7000, y= 60, label="LD50(nM): ",  color="red") +
   annotate(geom="text", x=9800, y= 60, label=val,  color="red")

p # show the plot

#END OF SCRIPT

11 comments:

  1. Hi, I am trying to plotting your tutorial in my computer.
    However, whenever run below script, there is an error

    p <- p + geom_smooth(method = "nls",
    formula = y ~ bot+(top-bot)/(1+( x / LD50)^slope),
    start=list(bot=20, top=95, LD50=3, slope=-12),
    se = F, size = 0.5)

    Error: Unknown parameters: start

    How to solve this problem?

    ReplyDelete
    Replies
    1. Hi,
      Thanks for your comment.
      I have just rerun the whole script and the piece that adds the line but I can't reproduce your error.
      I have also cut and pasted your text above and it works fine for me.

      Some things to check:
      (1) has your basic plot worked? has the object p been created?
      check output by looking at p the graphing object.
      (2) did the line fit work?
      was "val" made?
      This shows the data input works so adds confidence.
      (3) If you have been playing with the geom_smooth fitting, you might have altered the object p.
      Try remaking the object and showing it.
      Then add the geom_smooth again and running it again.
      You could also try clearing the data, cut and pasting the script again and running it all together.
      How does that sound?
      Feel free to leave another comment. I will do more if I can.
      Best wishes,
      Paul


      Delete
  2. # Add the line to graph
    p <- p + geom_smooth(method = "nls",
    formula = y ~ bot+(top-bot)/(1+( x / LD50)^slope),
    start=list(bot=20, top=95, LD50=3, slope=-12),
    se = F, size = 0.5)

    Hi, I tried to plot your data before I plot my data.
    However, even though I copied and pasted your script, there is an error message.
    'start' is unknown function.
    When I discard the start part, ggplot2 shows warning message again that 'specifying start points'
    What should I do?
    Is it version problem?

    ReplyDelete
  3. Thanks for your comment. Sorry not to respond sooner.
    I recommend that you go through the script line by and line.
    After line 40, the graph object is made, so plot the object "p" before you add the line to the graph.
    This will ensure that the package ggplot2 is working for you.

    The 'start' function is within ggplot2 so if ggplot2 works then the 'start' function should work too.

    It's probably not a version problem.
    I have just tested the script now. I'm using "version.string R version 3.2.2 (2015-08-14)" with ‘ggplot2’ version 1.0.1
    What are you using?
    Best wishes,
    Paul

    ReplyDelete
  4. I am getting the same error. I used ggplot2 first and I made sure it worked. But when I add geom_smooth with nls method. I get "Error: Unknown parameters: start"
    I am not sure the problem is but I appreciate your input to figure it out

    ReplyDelete
  5. Dear Ali,
    Thanks for your comment. YES, this is a version problem.
    The new version of ggplot - version: 2.0.0 requires different arguments for the non-linear curve.
    I will write a new script and edit this...
    For your info, the details are on the following pages:
    * http://blog.rstudio.org/2015/12/21/ggplot2-2-0-0/
    * http://docs.ggplot2.org/dev/geom_smooth.html
    More soon.
    Thanks again,
    Paul

    ReplyDelete
  6. This script now works with ggplot 2.0.0.
    Thanks for the comments. Other scripts will be affected.
    I will change them over the next few days.
    Thanks for the comments.

    ReplyDelete
  7. How could this be altered to incorporate multiple y? Let's say you had 3 plates of dead cells and wanted to plot them all on the same plot with separate LD50. is it possible?

    ReplyDelete
    Replies
    1. Dear JB,
      Thanks for your question.
      Yes, it is possible to alter this to incorporate multiple y.
      There are a couple of blog posts that consider this:
      (1) Plotting two enzyme plots with ggplot at
      http://rforbiochemists.blogspot.co.uk/2015/06/plotting-two-enzyme-plots-with-ggplot.html

      (2) Comparing two drugs on the same plot - NOT ggplot but...
      http://rforbiochemists.blogspot.co.uk/2015/04/comparing-two-drugs-on-same-plot.html

      I hope these help but if you want more info, please post another comment and I will try to address it.
      Best wishes,
      Paul

      Delete
  8. thanks for this article. im a newbie. Can you please explain to me this part please
    fit <- nls(dead.cells ~ bot+(top-bot)/(1+(log.nM/LD50)^slope),
    data = data,
    start=list(bot=20, top=95, LD50=3, slope=-12))

    ReplyDelete
    Replies
    1. Hello,
      Thanks for your comment. This line uses the nls() function to make fit a line to the points. The nls() function is a nonlinear (weighted) least-squares fit.
      The nls() function requires:
      (1) an equation which is the first argument;
      (2) the data; and
      (3) some starting values.
      Without each of these things, the nls() function will fail.

      Different nonlinear fits require different lines.
      For contrast, see the use of the nls() function for enzyme kinetics in this piece: http://rforbiochemists.blogspot.co.uk/2015/06/plotting-enzyme-data-with-ggplot-part-i.html

      For info from R about the nls() function, see: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/nls.html

      I hope this helps. I'm happy to try to explain more if you add another comment or send an email.
      Best wishes,
      Paul

      Delete

Comments and suggestions are welcome.