Thursday 21 January 2021

Programmatic visualization of UK SARS-CoV-2 spike protein variant

Building on my visualization of SARS-CoV-2 spike proteins,  this script provides a R script to allow you to draw a schematic of the corona virus S1 spike protein and the UK variant that has changes within the S1 protein. 

Here is the visualisation and below is the code to make it. 



START
# viz the changes of the UK variant in S1 spike protein....
library(drawProteins)
library(ggplot2)
library(tidyverse)

# download protein data from
# Uniprot link: https://www.uniprot.org/uniprot/P0DTC2
drawProteins::get_features("P0DTC2") -> spike_sars
drawProteins::feature_to_dataframe(spike_sars) -> spike_data

# pull out S1 chain... begins 13 ends: 685
spike_data %>%
    filter(begin > 12 & end < 686) -> s1_bot

# duplicate this and put order = 2
s1_top <- s1_bot
s1_top$order <- 2

# combine these two 
s1_both <- rbind(s1_top, s1_bot)

# draw canvas, chains & regions
draw_canvas(s1_both) -> p
p <- draw_chains(p, s1_both, labels = c("S1 protein", "B.1.1.7 variant"))
p <- draw_regions(p, s1_both)


# here are the details of the changes...

uk_variant <- tribble(
    ~type, ~description, ~begin, ~end, ~length, ~accession, ~entryName, ~taxid,
    ~order,
    "B.1.1.7", "deletion", 69, 70, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "deletion", 144, 144, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "substitution", 501, 501, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "substitution", 570, 570, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,   
    "B.1.1.7", "substitution", 681, 681, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "substitution", 716, 716, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "substitution", 982, 982, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "substitution", 1118, 1118, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
)

# overlay information about the variants
p <- p + geom_point(data = filter(uk_variant, begin < 686),
                          aes(x = begin,
                              y = order+0.2, 
                              shape = description), size = 5)


# style the plot a bit...
p <- p + theme_bw(base_size = 14) + # white background
    theme(panel.grid.minor=element_blank(), 
          panel.grid.major=element_blank()) +
    theme(axis.ticks = element_blank(), 
          axis.text.y = element_blank()) +
    theme(panel.border = element_blank()) +
    theme(legend.position = "bottom")

p <- p + labs(title = "Schematic of SARS-CoV-2 S1 Protein and UK variant",
              subtitle = "Source: Uniprot (https://www.uniprot.org/uniprot/P0DTC2)")
p

END

I feel this could, and maybe will, be better but I'm stopping for now :-)

Some Resources

For more help, bug reports or to suggest features

Saturday 16 January 2021

Programmatic visualization of SARS-CoV-2 Spike Protein

The SARS-CoV-2 Spike protein and its variants are very important at the moment. I thought it would be interesting to showcase using my Bioconductor package drawProteins to programmatically draw a visualization of the Spike protein. This helped me understand a little more about the protein too. 

The data for making the visualization is from Uniprot: https://www.uniprot.org/uniprot/P0DTC2

Here is the visualisation and below is the code to make it. 



START

library(drawProteins)

library(ggplot2)

library(tidyverse)


# Uniprot link: https://www.uniprot.org/uniprot/P0DTC2

drawProteins::get_features("P0DTC2") -> spike_sars

drawProteins::feature_to_dataframe(spike_sars) -> spike_data


# From the Uniprot entry, it say that the Spike protein

# is made as a single protein and then processed into 

# S1 and S2 protein. 

# thus the Uniprot entry has multiple chains


# Processing Uniprot data to create different proteins

# pull out full length chain

spike_data %>% 

    filter(begin < 685 & end == 1273) -> spike_data_1

# want this at the top... 

spike_data_1$order = 3


# pull out S1 chain... begins 13 ends: 685

spike_data %>%

    filter(begin < 685 & end < 686) -> spike_data_2

# want this next... 

spike_data_2$order = 2


# pull out S2 chain...  begins 686; ends: 1273

spike_data %>%

    filter(begin > 685 & end < 1274) -> spike_data_3

# want this at the bottom

spike_data_3$order = 1


# combine all back for plotting

spike_data_o <- rbind(spike_data_1, spike_data_2, spike_data_3)


# pull out names for chains

spike_data_o %>%

    filter(type == "CHAIN") -> chains

chain_names <- c(chains$description[1:3], "")


# draw canvas and chains...

draw_canvas(spike_data_o) -> p

p <- draw_chains(p, spike_data_o,

                 labels = chain_names) 


# add regions to S1 and S2

p <- draw_regions(p, spike_data_o)


p <- p + theme_bw(base_size = 14) + # white background

    theme(panel.grid.minor=element_blank(), 

          panel.grid.major=element_blank()) +

    theme(axis.ticks = element_blank(), 

          axis.text.y = element_blank()) +

    theme(panel.border = element_blank()) +

    theme(legend.position = "bottom")


p <- p + labs(title = "Schematic of SARS-CoV-2 Spike Protein",

    subtitle = "Source: Uniprot (https://www.uniprot.org/uniprot/P0DTC2")

p

END


Some Resources

For more help, bug reports or to suggest features