One of the key things to explain is the importance of objects in R.
Data is located in objects and there are a variety of data structures and data types.
I have written an R script to try to explore objects, particularly data structures.
I use three of the types of data structures regularly:
I create these objects using the assignment operator "<-" or functions like lm().
I apply functions to these objects. For example plot().
I extract data from these functions using square brackets [] or $.
I have created slides using R Markdown to present on the Training Day.
Scripts and R Markdown files are available on Github.
I have learned a lot from these sources:
- http://www.statmethods.net/input/datatypes.html
- http://www.r-tutor.com/r-introduction/
- The Art of R Programming: A Tour of Statistical Software Design by Norman Matloff
The official reference is this: http://cran.r-project.org/doc/manuals/r-release/R-intro.html
# START of SCRIPT
# Exploring Data Structures
## objects are made up of various types.
## I want to discuss objects that contain data
## Data goes into objects
### Use the assignment function "<-"
### Protein Concentrations
prot <- c(0.000, 0.016, 0.031, 0.063, 0.125, 0.250, 0.500, 1.000,
0.000, 0.016, 0.031, 0.063, 0.125, 0.250, 0.500, 1.000)
### Absorbance from my protein assay
abs <- c(0.329, 0.352, 0.349, 0.379, 0.417, 0.491, 0.668, 0.956,
0.327, 0.341, 0.355, 0.383, 0.417, 0.446, 0.655, 0.905)
### these appear in the R-Studio environment as Values
## These objects are vectors - all the data elements must be the same type
### A vector is the simplist type of object
### can be numeric, character, logical, factors
class(prot) #### numeric
### Some other types of vectors
protein <- "albumin"
class(protein) #### character
truth <- c(TRUE, FALSE, TRUE, TRUE)
class(truth) #### logical
### you can identify things inside the objects
prot[2]
### and parts of objects
prot[1:8]
### functions can be applied to whole objects (particularly arrays)
### the plot function puts the first element of each object against each other
plot(abs~prot)
# More Complicated structures
## <b>lists</b> are another type of object
## the lm() function makes an object called line which is a list.
## lists contain a mixture of data types.
line <- lm(abs~prot)
### the R-Studio environment says a "List of 12"
## there are various ways of getting information from this object
## type the name of the object
line
## use the summary() function
summary(line)
## use the $
summary(line)$r.squared
### we used this to extract the r2
### we created the object r2 using the function summary()
r2 <- summary(line)$r.squared
### and the function round() - gives us three decimal points
r2 <- round(summary(line)$r.squared, 3)
r2
class(r2)
### from the list we have extracted a number.
# <b>matrices</b> are two dimensional structures
## the data types are all the same
# <b>data frames</b> are two dimensional structures
## contains different types of data
# often when we import data, it gets imported as a data frame.
## here is an example:
data <- read.csv("http://science2therapy.com/data/wellsDataSimp.csv")
## the R-Studio environment puts it in "data" and gives us some info
## Have a quick look at it
View(data) # works in R-Studio
str(data)
## we have names of columns and we have the class of the data within the column
## note: Factors, num, int
data$Virus
# Simple plot from this data frame
plot(data[5:7])
## Another plot from this data frame
plot(data$P.Erk, data$S.phase.cnt)
## we can manipulate objects including data frames
## which is the subject of the next tutorial.
No comments:
Post a Comment
Comments and suggestions are welcome.