Friday, 15 February 2019

Bar chart of common mental disorders...

My day job in the School of Medicine at Cardiff University involves facilitating learning around various medical conditions including mental health. I like a few statistics so I have been exploring the prevalence of mental health disorders. I found a report about mental health from Our World in Data which shares all the data it uses on Github - making it open source. There is lots of interesting data.
As well as mental health, there is data and reports about cancer and the burden of disease.

Inspired by the mental health report from Our World in Data, I downloaded some data and generated a graph which shows the prevalence of Mental Health Disorders in the UK.

Here is the graph:





Here is the R script that generated the graph and a few other graph along the way.

===  START ===
# looking at some mental health data...
# source: https://ourworldindata.org/mental-health

library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)

# download the data from Github
data <- read_csv("https://raw.githubusercontent.com/owid/owid-datasets/master/datasets/Mental%20health%20prevalence%20(IHME)/Mental%20health%20prevalence%20(IHME).csv")

# pull out data for UK and wrangle using pipes and dplyr
data %>% 
    # filter() by country and year
    filter(Entity == "United Kingdom", Year == 2016) %>%
    # select() prevalence - percentage 3rd to 13th column
    select(3:13) %>%
    # turn from wide format to long for better plotting using gather()
    gather(key = "CMHD", value = "prevalence") -> data1

# now have new object data1

# first bar chart...
ggplot(data1, aes(x = CMHD, y = prevalence)) +
    geom_bar(stat = "identity")


# plot horizontally with coord_flip()
ggplot(data1, aes(x = CMHD, y = prevalence)) +
    geom_bar(stat = "identity") +
    coord_flip()


# remove the text "- both sexes (percent)" gsub() function
data1$CMHD <- gsub(" \\- both sexes \\(percent\\)", "", data1$CMHD)
# the \\ are escape characters for minus and brackets 

# AND

# reorder the categories as factors by size of prevalence
# https://www.reed.edu/data-at-reed/resources/R/reordering_geom_bar.html
data1$CMHD <- factor(data1$CMHD, levels = data1$CMHD[order(data1$prevalence)])

p <- ggplot(data1, aes(x = CMHD, y = prevalence)) +
    geom_bar(stat = "identity") +
    coord_flip()
p


# add some labels and source....
p <- p +
    theme_bw() +
    labs(x = "",
        y = "Prevalence (%)",
        title = "Prevalence of Common Mental Health Disorders in UK (2016)", 
        subtitle = "https://ourworldindata.org/mental-health")
p


# Our World in Data website has the numbers on the plot...
p <- p +
    geom_text(aes(label=round(prevalence, 2)))
p


# Our World website has different coloured bars on the plot...
# by altering fill in the aes() of ggplot
p <- ggplot(data1, aes(x = CMHD, y = prevalence, fill = CMHD)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    theme_bw() +
    labs(x = "",
        y = "Prevalence (%)",
        title = "UK Prevalence of Common Mental Health Disorders (2016)", 
        subtitle = "https://ourworldindata.org/mental-health") +
    geom_text(aes(label=round(prevalence, 1)))
p


#  Which adds a legend... so remove the legend...
p + theme(legend.position="none")
=== END ===

Some resources: