R for Biochemists: dying

Tuesday, 31 October 2017

Chance of dying for males and females using the UK Life Tables

I am currently answering comments to R for Biochemists 101 - online training that I've created for the Biochemical Society. One of the participants asked about overlaying the different chances of dying for male and females inspired by the blog piece Exploring the chance of dying using the UK Life Tables.

As is often the case in R, there is a quick and simple answer which works ok if the data is very self explanatory. However, it's difficult to draw a legend.

There is also a better way that builds on the strengths of ggplot2 but it requires reformatting the data. I've used the melt() function from the reshape package.

Here is the graph made with the better answer:

The code below also shows the quick way...

START
library(readxl)
library(ggplot2)

# The UK National Life Tables
# these are available through the Office of National Statistics
# as an excel file from this page:
# https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/datasets/nationallifetablesunitedkingdomreferencetables
# the can be downloaded from here:
ons_url <- c("https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/datasets/nationallifetablesunitedkingdomreferencetables/current/nltuk1315reg.xls")
# to guarantee undistrubed access, I've put the file on github
github_url <- "https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/nltuk1315reg.xls"

# the download.file() function downloads and saves the file with the name given
download.file(url=ons_url,destfile="file.xls", mode="wb")
# if this doesn't work try replacing the url with the github_url
# then we can open the file and extract the data using the read_excel() function.
data<- read_excel("file.xls", sheet = 5, skip = 6)

# need to remove four bottom rows - these are all blank.
data <- data[1:101,]

colnames(data)

# x = age in years
# first set is males and second set is females
# mx is the central rate of mortality, defined as the number of deaths at age x last birthday in the three year period to which the National Life Table
# relates divided by the average population at that age over the same period.

# qx is the mortality rate between age x and (x +1), that is the probability that a person aged x exact will die before reaching age (x +1).

# to allow us to separate the male and female data, I am relabelling the column names

colnames(data) <- c("age",
"m.mx", "m.qx", "m.lx", "m.dx", "m.ex",
"f.mx", "f.qx", "f.lx", "f.dx", "f.ex")

# the life table allow us to draw some interesting curves
# the first one I want to explore is my chance of dying this year.

# qx is the key value.
# the chance of my dying before my next birthday.

# the ending point from the previous graph
# from http://rforbiochemists.blogspot.co.uk/2017/03/exploring-chance-of-dying-using-uk-life.html

p <- ggplot(data = data,
aes(x = age, y = m.qx)) +
geom_line(colour = "red") +
geom_point(colour = "red", size = 0.75) +
ylab("Chance of dying") +
xlab("Current Age") +
ggtitle("Chance of Men Dying before next Birthday") +
scale_y_log10(
breaks = c(0.0001, 0.001, 0.01, 0.1, 0.5), # where to put labels
labels = c("1/10,000", "1/1000", "1/100", "1/10", "1/2")) + # the labels
theme_bw()

# show the object
p

# quick way
# add geom_line() and geom_point() with more data
p +
geom_line(data = data,
aes(x = age, y = f.qx),
colour = "blue") +
geom_point(data = data,
aes(x = age, y = f.qx),
colour = "blue", size = 0.75) +
# and change title
ggtitle("Chance of Dying before next Birthday")

# However, it's really difficult to add a proper legend

# Longer way - reformat the data into 'long' format
# using melt() function from reshape() package

library(reshape2)
# subset the columns we want - just three
data_s <- data.frame(cbind(data$age, data$m.qx, data$f.qx))
# rename columns
colnames(data_s) <- c("age", "male", "female")

# create new object with melt in long format
data_long <- melt(data_s, id.vars = "age")
# rename columns...
colnames(data_long) <- c("age", "gender", "chance_die")

# now draw the object
# using the colour aesthetics
# you get a legend automatically.
ggplot(data = data_long,
aes(x = age,
y = chance_die,
colour = gender)) +
geom_line() +
geom_point() +
ylab("Chance of dying") +
xlab("Current Age") +
ggtitle("Chance of Dying before next Birthday") +
scale_y_log10(
breaks = c(0.0001, 0.001, 0.01, 0.1, 0.5),
labels = c("1/10,000", "1/1000", "1/100", "1/10", "1/2")) +
theme_bw()

END of SCRIPT

Resources

For more about melting and overlay, here is an example of some enzymatic data http://rforbiochemists.blogspot.co.uk/2015/06/plotting-enzyme-data-with-ggplot-part-i.html
An example of oven time courses http://rforbiochemists.blogspot.co.uk/2015/08/temperature-time-course.html
Plotting numbers of publications http://rforbiochemists.blogspot.co.uk/2015/07/how-many-publications-in-your-area-of.html

Monday, 6 March 2017

Exploring the chance of dying using the UK Life Tables

I am working towards creating a Men's Health Data Visualisation for the Battle of the Beards, an IT conference to be held in Cardiff at the end of March.

I was inspired by Professor David Spiegelhalter from the University of Cambridge. I heard him talk at Cardiff University. He has a blog about Uncertainty and this graph is inspired one of his blog posts. I've been reading his book entitled The Norm Chronicles (written with Michael Blastland) - all about risk, life and death.

My starting point for my exploration of Men's Health is a graph that can calculate my chance of dying this year. As my source, I'm using the UK Life Tables which have calculated the chance of dying between birthdays.

I became a bit more worried about dying as I got older and particularly when I became a father. My chance of dying at 47 is 1 in 390. However, I wanted to develop habits that would reduce this chance a little. More about that next time...

START

# I am preparing a session for the Battle of the Beards
# It's an IT conference with difference
# Fight male suicide at “The Battle of the Beards”
# Talking tech & making a difference
# Cardiff | 29 March 2017
# http://battleofthebeards.info/

library(readxl)
library(ggplot2)

# my preparation involves exploring the UK National Life Tables
# these are available through the Office of National Statistics
# as an excel file from this page:
# https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/datasets/nationallifetablesunitedkingdomreferencetables
# the can be downloaded from here:
ons_url <- c("https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/datasets/nationallifetablesunitedkingdomreferencetables/current/nltuk1315reg.xls")
# to guarantee undistrubed access, I've put the file on github
github_url <- "https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/nltuk1315reg.xls"

# the download.file() function downloads and saves the file with the name given
download.file(url=ons_url,destfile="file.xls", mode="wb")
# if this doesn't work try replacing the url with the github_url
# then we can open the file and extract the data using the read_excel() function.
data<- read_excel("file.xls", sheet = 5, skip = 6)

# need to remove four bottom rows - these are all blank.
data <- data[1:101,]

colnames(data)

# x = age in years
# first set is males and second set is females
# mx is the central rate of mortality, defined as the number of deaths at age x last birthday in the three year period to which the National Life Table
# relates divided by the average population at that age over the same period.

# qx is the mortality rate between age x and (x +1), that is the probability that a person aged x exact will die before reaching age (x +1).

# lx is the number of survivors to exact age x of 100,000 live births of the same sex who are assumed to be subject throughout their lives to the
# mortality rates experienced in the three year period to which the National Life Table relates.

# dx is the number dying between exact age x and (x +1) described similarly to lx, that is dx=lx-lx+1.

# ex is the average period expectation of life at exact age x, that is the average number of years that those aged x exact will live thereafter
# based on the mortality rates experienced in the three year period to which the National Life Table relates.

# to allow us to separate the male and female data, I am relabelling the column names

colnames(data) <- c("age",
"m.mx", "m.qx", "m.lx", "m.dx", "m.ex",
"f.mx", "f.qx", "f.lx", "f.dx", "f.ex")

# the life table allow us to draw some interesting curves
# the first one I want to explore is my chance of dying this year.

# qx is the key value.
# the chance of my dying before my next birthday.

# here is the graph in ggplot
ggplot(data = data,
aes(x = age, y = m.qx)) +
geom_point()

# The log 10 plots give an interesting pattern.
# first shown to me by Professor David Spiegelhalter
# Here it is in my blog https://understandinguncertainty.org/what-your-effective-age
ggplot(data = data,
aes(x = age, y = log10(m.qx))) +
geom_point()

# Another way to change the scale on the y-axis using scale_y_log10()
ggplot(data = data,
aes(x = age, y = m.qx)) +
geom_point() +
scale_y_log10()

# let's make this look a bit nicer with lines, titles and a theme...
# make the object p
p <- ggplot(data = data,
aes(x = age, y = m.qx)) +
geom_line(colour = "red") +
geom_point(colour = "red", size = 0.75) +
ylab("Chance of dying") +
xlab("Current Age") +
ggtitle("Chance of Dying before next Birthday") +
scale_y_log10() +
theme_bw()

p # show the object

# label on y axis is not the most useful, I think.
# good learning point here on customising the y-axis label.
# added as arguments in the scale_y_log10()

p <- ggplot(data = data,
aes(x = age, y = m.qx)) +
geom_line(colour = "red") +
geom_point(colour = "red", size = 0.75) +
ylab("Chance of dying") +
xlab("Current Age") +
ggtitle("Chance of Men Dying before next Birthday") +
scale_y_log10(
breaks = c(0.0001, 0.001, 0.01, 0.1, 0.5), # where to put labels
labels = c("1/10,000", "1/1000", "1/100", "1/10", "1/2")) + # the labels
theme_bw()

p # show the object

# add source of the data
source <- paste("Source: Office of National Statistics\n filename: nltuk1315reg.xls\n accessed:", Sys.Date())
p + annotate("text", x = 70, y = 0.0001, label=source, size=3)

# more info available from the Office of National Statistics here:
# https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies

A protein assay	An enzyme kinetic plot	Visual Index	Study at Cardiff

R for Biochemists

A site to help Biochemists learn R.

How to use this site...

Starting points

Tuesday, 31 October 2017

Chance of dying for males and females using the UK Life Tables

Monday, 6 March 2017

Exploring the chance of dying using the UK Life Tables