The gender pay gap in the UK may not of primary interest to some biochemists but the Biochemical Society is interested in gender equality and the majority of biochemistry undergraduates are female... Here is the Biochemical Society's policy statement and here is something relevant from their blog.
A report about the gender pay gap was tweeted about today by @UKParliament (it seems it was published last year). There is an Excel file that goes with the report.
Today, I've been using R to explore some of the data and I have written a script below to make these graphs - the first two graphs from the report.
The way the data is presented makes me uncomfortable with men being paid more represented as a positive percentage and women being paid more being expressed as a negative percentage. I feel sure there is a better way....
Still, the data is interesting....
Here is the script:
START
library(RCurl)
library(readxl)
library(ggplot2)
library(reshape2)
library(ggthemes)
# this is the link to the data
link <- "http://researchbriefings.files.parliament.uk/documents/SN07068/data-tables.xlsx"
# the download.file() function downloads and saves the file with the name given
download.file(url=link,destfile="file.xlsx", mode="wb")
# then we can open the file and extract the data using the read_excel() function.
data <- read_excel("file.xlsx", skip=3, col_names=TRUE)
str(data)
# shows that Year is characters
data[,1] <- as.numeric(data[,1]) # change to number
data <- data[1:22,] # get rid of seven rows of NAs.
names <- colnames(data)
names[1] <- "Year"
names[2] <- "All_employees"
colnames(data) <- names # make column names easier to use
data[,2:4] <- data[,2:4]*100 # Excel stores percents as decimals
# reshape the data from wide to long format
data.melt <- melt(data, id.vars = "Year")
colnames(data.melt) <- c("Year", "empType", "gendGap")
# draw the graph
p1 <- ggplot(data.melt, aes(x=Year,
y= gendGap,
colour = empType)) +
geom_point() + # draw the points
geom_line(size=1) + # draw the lines
labs(color = "Employment Type") + # customizes the legend title
ylab("Gender Gap (%)") + # y-label
ggtitle("Gender Pay Gap, UK, 1997-2015") + # graph title
ylim(-10,30) +
xlim(1995, 2015) +
geom_hline(yintercept = 0) + # nice line at zero
theme_bw()
p1 <- p1 + theme(legend.text=element_text(size = 12), # increase size of text
legend.title=element_text(size = 12)) # and title
p1 <- p1 + theme(axis.title.y = element_text(size = 14 )) +
theme(axis.text = element_text(size = 12))
p1 # show the graph
# maybe you prefer a different theme.
p1 + theme_hc()
# maybe without a legend but with labels on the lines:
p1 <- p1 + theme(legend.position="none") +
geom_text(data = data.melt[which(data.melt$Year == "2013"),],
aes(label = empType),
vjust = -2)
p1
# draw the second graph with the age data...
data2 <- read_excel("file.xlsx", sheet=2, skip=3, col_names=TRUE)
View(data2)
str(data2)
data2 <- data2[1:8,]
# multiply numbers by 100 to give percentages
data2[,2:4] <- data2[,2:4]*100
names <- colnames(data2)
names[1] <- "Ages"
names[2] <- "All_employees"
colnames(data2) <- names
data2.type <- data2[3:8,]
data2.melt <- melt(data2.type, id.vars = "Ages")
colnames(data2.melt) <- c("Ages", "empType", "gendGap")
g <- ggplot(data = data2.melt[7:18,], aes (x = Ages, y = gendGap, fill = empType))
g <- g + geom_bar(stat="identity", position="dodge", width = 0.75) +
ylim(-11,20) +
ylab("Gender Gap (%)") + # y-label
xlab("Age") + # x-label
ggtitle("Gender Pay Gap by Age, April 2015") +
labs(fill = "Employment Type") + # customizes the legend title
theme_hc() +
theme(legend.position=c(0,1), # move to the top left
legend.justification=c(0,1.5)) # move it in a bit
g # show the graph...
No comments:
Post a Comment
Comments and suggestions are welcome.