As is often the case in R, there is a quick and simple answer which works ok if the data is very self explanatory. However, it's difficult to draw a legend.

There is also a better way that builds on the strengths of ggplot2 but it requires reformatting the data. I've used the melt() function from the reshape package.

Here is the graph made with the better answer:

The code below also shows the quick way...

START

library(readxl)

library(ggplot2)

# The UK National Life Tables

# these are available through the Office of National Statistics

# as an excel file from this page:

# https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/datasets/nationallifetablesunitedkingdomreferencetables

# the can be downloaded from here:

ons_url <- c("https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/datasets/nationallifetablesunitedkingdomreferencetables/current/nltuk1315reg.xls")

# to guarantee undistrubed access, I've put the file on github

github_url <- "https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/nltuk1315reg.xls"

# the download.file() function downloads and saves the file with the name given

download.file(url=ons_url,destfile="file.xls", mode="wb")

# if this doesn't work try replacing the url with the github_url

# then we can open the file and extract the data using the read_excel() function.

data<- read_excel("file.xls", sheet = 5, skip = 6)

# need to remove four bottom rows - these are all blank.

data <- data[1:101,]

colnames(data)

# x = age in years

# first set is males and second set is females

# mx is the central rate of mortality, defined as the number of deaths at age x last birthday in the three year period to which the National Life Table

# relates divided by the average population at that age over the same period.

# qx is the mortality rate between age x and (x +1), that is the probability that a person aged x exact will die before reaching age (x +1).

# to allow us to separate the male and female data, I am relabelling the column names

colnames(data) <- c("age",

"m.mx", "m.qx", "m.lx", "m.dx", "m.ex",

"f.mx", "f.qx", "f.lx", "f.dx", "f.ex")

# the life table allow us to draw some interesting curves

# the first one I want to explore is my chance of dying this year.

# qx is the key value.

# the chance of my dying before my next birthday.

# the ending point from the previous graph

# from http://rforbiochemists.blogspot.co.uk/2017/03/exploring-chance-of-dying-using-uk-life.html

p <- ggplot(data = data,

aes(x = age, y = m.qx)) +

geom_line(colour = "red") +

geom_point(colour = "red", size = 0.75) +

ylab("Chance of dying") +

xlab("Current Age") +

ggtitle("Chance of Men Dying before next Birthday") +

scale_y_log10(

breaks = c(0.0001, 0.001, 0.01, 0.1, 0.5), # where to put labels

labels = c("1/10,000", "1/1000", "1/100", "1/10", "1/2")) + # the labels

theme_bw()

# show the object

p

# quick way

# add geom_line() and geom_point() with more data

p +

geom_line(data = data,

aes(x = age, y = f.qx),

colour = "blue") +

geom_point(data = data,

aes(x = age, y = f.qx),

colour = "blue", size = 0.75) +

# and change title

ggtitle("Chance of Dying before next Birthday")

# However, it's really difficult to add a proper legend

# Longer way - reformat the data into 'long' format

# using melt() function from reshape() package

library(reshape2)

# subset the columns we want - just three

data_s <- data.frame(cbind(data$age, data$m.qx, data$f.qx))

# rename columns

colnames(data_s) <- c("age", "male", "female")

# create new object with melt in

**long format**

data_long <- melt(data_s, id.vars = "age")

# rename columns...

colnames(data_long) <- c("age", "gender", "chance_die")

# now draw the object

# using the colour aesthetics

# you get a legend automatically.

ggplot(data = data_long,

aes(x = age,

y = chance_die,

colour = gender)) +

geom_line() +

geom_point() +

ylab("Chance of dying") +

xlab("Current Age") +

ggtitle("Chance of Dying before next Birthday") +

scale_y_log10(

breaks = c(0.0001, 0.001, 0.01, 0.1, 0.5),

labels = c("1/10,000", "1/1000", "1/100", "1/10", "1/2")) +

theme_bw()

**END of SCRIPT**

**Resources**

- For more about melting and overlay, here is an example of some enzymatic data http://rforbiochemists.blogspot.co.uk/2015/06/plotting-enzyme-data-with-ggplot-part-i.html
- An example of oven time courses http://rforbiochemists.blogspot.co.uk/2015/08/temperature-time-course.html
- Plotting numbers of publications http://rforbiochemists.blogspot.co.uk/2015/07/how-many-publications-in-your-area-of.html

## No comments:

## Post a Comment

Comments and suggestions are welcome.