R for Biochemists: Drawing a protein domain structure using R....

As a biochemist, I like a protein structure. Today, I have written a script to draw a protein structure for one of my favourite proteins, the NF-kappaB subunit, p65 - also known as Rel A. I've been measuring NF-kappaB since I published my first paper from my PhD in Trinity College Dublin (many years ago) and one of my most cited papers from Cardiff University is one of the first measurement of Rel A in a primary cells from a human cancer - chronic lymphocytic leukaemia.

The diagram is created using information from the Uniprot webpage for p65. It shows the domain structure for p65. Here is the diagram:

For those interested, RHD stands for Rel Homology Domain and TAD stands for Transactivation Domain. For more information, here's a link to the Wikipedia page.

Here is the script that draws the diagram:

SCRIPT START

# draw the NF-kappaB subunit, Rel A (p65) with R.

# draw it as a series of rectanges

# going from left to right.

# using the Uniprot webpage for the informaton

# http://www.uniprot.org/uniprot/Q04206

# cut and paste from the XML page to create two objects

# first object is a list containing accession number and names

## Step 1: a list containing names and details

# list of names...

names <- list(

accession = c("Q04206"),

name = "TF65_HUMAN",

protein.recommendedName.fullName = "Transcription factor p65",

protein.recommendedName.alternativeName = "Nuclear factor NF-kappa-B p65 subunit",

protein.fullName = "Nuclear factor of kappa light polypeptide gene enhancer in B-cells 3",

gene.name.primary = "RELA",

gene.name.synonym = "NFKB3",

organism.name.scientific = "Homo sapiens"

)

## step 2: create the data frame with all the information.

# cut and past from the XML page to make vectors

# features to plot

types <- c("chain", "domain", "region of interest", "short sequence motif", "short sequence motif")

description <- c("Transcription factor p65", "RHD", "Activation domain","Nuclear localization signal", "9aaTAD")

begin <- c(1,19, 415, 301, 536)

end <- c(551, 306, 459, 304, 544)

col <- c("white", "blue", "red", "black", "orange")

# assemble vectors into a data frame

features <- data.frame(types, description, begin, end, col)

# check the structure of the data frame

str(features)

# shows description and col to be factors - this will cause problems later...

# so change them now

features$description <- as.character(features$description)

features$col <- as.character(features$col)

# it will be better if we sort features in order of where they begin

features <- features[order(features$begin),]

## step 3: draw the diagram

screen.width <- max(features$end)

screen.height <- 25 # this is a bit arbitary

plot(c(-10, screen.width),

c(0, screen.height),

type= "n",

xlab = "Number of amino acids",

ylab = "", yaxt='n') # suppress the y label and y axis

# make the rectangles in a loop

for (i in 1:length(features$types) ) {

rect(xleft = features$begin[i],

ytop = screen.height/2 + 2.5,

ybottom = screen.height/2 - 2.5,

xright = features$end[i],

col = features$col[i])

}

# add text to the top of the illustration with the recommended name

text(max(features$end)/2, screen.height-2.5, names$protein.recommendedName.fullName, cex=1.5)

# and the alternative name

text(max(features$end)/2, screen.height-5, names$protein.recommendedName.alternativeName, cex=1)

# add the descriptions of the features

pos.text.x <- features$begin[2:5] + (features$end[2:5] - features$begin[2:5])/2

pos.text.y <- c(screen.height/2 + 3.5, screen.height/2 - 3.5)

text(pos.text.x, pos.text.y, features$description[2:5], cex=1, col=features$col[2:5])

# add the accession number to the bottom smaller text and the source of the data

text(max(features$end)/2, 5 , paste("Uniprot Accession Number:", names$accession), cex=0.8)

text(max(features$end)/2, 3 , "Souce of data: http://www.uniprot.org/uniprot/Q04206", cex=0.8)

A protein assay	An enzyme kinetic plot	Visual Index	Study at Cardiff

R for Biochemists

A site to help Biochemists learn R.

How to use this site...

Starting points

Tuesday, 10 November 2015

Drawing a protein domain structure using R....

No comments:

Post a Comment