Saturday 16 January 2021

Programmatic visualization of SARS-CoV-2 Spike Protein

The SARS-CoV-2 Spike protein and its variants are very important at the moment. I thought it would be interesting to showcase using my Bioconductor package drawProteins to programmatically draw a visualization of the Spike protein. This helped me understand a little more about the protein too. 

The data for making the visualization is from Uniprot: https://www.uniprot.org/uniprot/P0DTC2

Here is the visualisation and below is the code to make it. 



START

library(drawProteins)

library(ggplot2)

library(tidyverse)


# Uniprot link: https://www.uniprot.org/uniprot/P0DTC2

drawProteins::get_features("P0DTC2") -> spike_sars

drawProteins::feature_to_dataframe(spike_sars) -> spike_data


# From the Uniprot entry, it say that the Spike protein

# is made as a single protein and then processed into 

# S1 and S2 protein. 

# thus the Uniprot entry has multiple chains


# Processing Uniprot data to create different proteins

# pull out full length chain

spike_data %>% 

    filter(begin < 685 & end == 1273) -> spike_data_1

# want this at the top... 

spike_data_1$order = 3


# pull out S1 chain... begins 13 ends: 685

spike_data %>%

    filter(begin < 685 & end < 686) -> spike_data_2

# want this next... 

spike_data_2$order = 2


# pull out S2 chain...  begins 686; ends: 1273

spike_data %>%

    filter(begin > 685 & end < 1274) -> spike_data_3

# want this at the bottom

spike_data_3$order = 1


# combine all back for plotting

spike_data_o <- rbind(spike_data_1, spike_data_2, spike_data_3)


# pull out names for chains

spike_data_o %>%

    filter(type == "CHAIN") -> chains

chain_names <- c(chains$description[1:3], "")


# draw canvas and chains...

draw_canvas(spike_data_o) -> p

p <- draw_chains(p, spike_data_o,

                 labels = chain_names) 


# add regions to S1 and S2

p <- draw_regions(p, spike_data_o)


p <- p + theme_bw(base_size = 14) + # white background

    theme(panel.grid.minor=element_blank(), 

          panel.grid.major=element_blank()) +

    theme(axis.ticks = element_blank(), 

          axis.text.y = element_blank()) +

    theme(panel.border = element_blank()) +

    theme(legend.position = "bottom")


p <- p + labs(title = "Schematic of SARS-CoV-2 Spike Protein",

    subtitle = "Source: Uniprot (https://www.uniprot.org/uniprot/P0DTC2")

p

END


Some Resources

For more help, bug reports or to suggest features



No comments:

Post a Comment

Comments and suggestions are welcome.