Thursday 2 March 2017

Using 'pipes' in R for easier reading code

The magrittr package allows us to write code using 'pipes'. The code is a little easier to read. Easier code to read is easier to share, document and use which appeals to me. I think it makes it more open and I'm a big fan of open science.

Using pipes avoids us piling up our functions on single lines and encourages me to layout my code in a more organised way.
Pipes allows space for documentation, explanations and comments.
I have used pipes to create this cluster diagram to illustrate the point:



Consider adding the use of pipes to your code to make it easier for others.


Here's is the script:

# START 
# download the data from github
library(RCurl)
x <- getURL("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/microArrayData.tsv")
data <- read.table(text = x, header = TRUE, sep = "\t")

# when we have a workflow that we like there is a tendency to pile up our functions

# here is an example:
plot(hclust(dist(t(data[2:15]))))

# the introduction of the magrittr piping function into R..x
# allows us to do this in a way that make a work flow easier to view and easier to comment

# install.packages("magrittr") # if required
library(magrittr)
# https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html

data[2:15] %>%   # subset the object (columns 2:15 of the dataframe)
  t() %>%        # transform it so that columns are rows
  dist() %>%     # calculate distance
  hclust() %>%   # do a hierarchical cluster
  plot()         # then plot the result. 

# this approach really makes life easier when we have arguments in our functions
# we can change the method for the dist() function
# and we can add a title and some colour with the plot() function
# using pipes, this looks like this:

data[2:15] %>%          # create a subset of object data (cols 2:15)
  t() %>%               # transform it so that columns are rows
  dist(method = "manhattan") %>%    # calc dist with manhattan meth
  hclust() %>%                      # do a hierarchial cluster
  plot(main="Cluster Diagram of Drug Treatments\n(2 Mar 2017)",     # add a title
       lwd=2, col="blue", cex = 1.1) # thick line & change colours  


# this code the other way looks like this:
plot(hclust(dist(t(data[2:15]), method = "manhattan")),
     main="Cluster Diagram of Drug Treatments\n(2 Mar 2017)", # add a title
     lwd=2, col="blue", cex = 1.1)

# it's a little bit difficult to separate the functions, objects and arguments in my opinion

# N.B. three key points to remember about using pipes:
# (1) brackets remain to allow us to identify functions()
# (2) the arguments go within the brackets - not the objects
# (3) objects are now 'piped' into the functions using %>%
# END

There are lots of blog posts about pipes and magrittr. Just search....

Hat tip to Steph Locke and Dave from the Cardiff R User group for encouraging me to use pipes. 

No comments:

Post a Comment

Comments and suggestions are welcome.