Thursday 21 January 2021

Programmatic visualization of UK SARS-CoV-2 spike protein variant

Building on my visualization of SARS-CoV-2 spike proteins,  this script provides a R script to allow you to draw a schematic of the corona virus S1 spike protein and the UK variant that has changes within the S1 protein. 

Here is the visualisation and below is the code to make it. 



START
# viz the changes of the UK variant in S1 spike protein....
library(drawProteins)
library(ggplot2)
library(tidyverse)

# download protein data from
# Uniprot link: https://www.uniprot.org/uniprot/P0DTC2
drawProteins::get_features("P0DTC2") -> spike_sars
drawProteins::feature_to_dataframe(spike_sars) -> spike_data

# pull out S1 chain... begins 13 ends: 685
spike_data %>%
    filter(begin > 12 & end < 686) -> s1_bot

# duplicate this and put order = 2
s1_top <- s1_bot
s1_top$order <- 2

# combine these two 
s1_both <- rbind(s1_top, s1_bot)

# draw canvas, chains & regions
draw_canvas(s1_both) -> p
p <- draw_chains(p, s1_both, labels = c("S1 protein", "B.1.1.7 variant"))
p <- draw_regions(p, s1_both)


# here are the details of the changes...

uk_variant <- tribble(
    ~type, ~description, ~begin, ~end, ~length, ~accession, ~entryName, ~taxid,
    ~order,
    "B.1.1.7", "deletion", 69, 70, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "deletion", 144, 144, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "substitution", 501, 501, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "substitution", 570, 570, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,   
    "B.1.1.7", "substitution", 681, 681, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "substitution", 716, 716, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "substitution", 982, 982, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
    "B.1.1.7", "substitution", 1118, 1118, 1, "P0DTC2","SPIKE_SARS2", 2697049, 1,
)

# overlay information about the variants
p <- p + geom_point(data = filter(uk_variant, begin < 686),
                          aes(x = begin,
                              y = order+0.2, 
                              shape = description), size = 5)


# style the plot a bit...
p <- p + theme_bw(base_size = 14) + # white background
    theme(panel.grid.minor=element_blank(), 
          panel.grid.major=element_blank()) +
    theme(axis.ticks = element_blank(), 
          axis.text.y = element_blank()) +
    theme(panel.border = element_blank()) +
    theme(legend.position = "bottom")

p <- p + labs(title = "Schematic of SARS-CoV-2 S1 Protein and UK variant",
              subtitle = "Source: Uniprot (https://www.uniprot.org/uniprot/P0DTC2)")
p

END

I feel this could, and maybe will, be better but I'm stopping for now :-)

Some Resources

For more help, bug reports or to suggest features

No comments:

Post a Comment

Comments and suggestions are welcome.