One of the key things to explain is the importance of

**objects**in R.

Data is located in objects and there are a variety of

**data structures**and

**data types**.

I have written an R script to try to explore objects, particularly data structures.

I use three of the types of

**data structures**regularly:

I create these objects using the assignment operator "<-" or functions like lm().

I apply functions to these objects. For example plot().

I extract data from these functions using square brackets [] or $.

I have created slides using R Markdown to present on the Training Day.

Scripts and R Markdown files are available on Github.

I have learned a lot from these sources:

- http://www.statmethods.net/input/datatypes.html
- http://www.r-tutor.com/r-introduction/
- The Art of R Programming: A Tour of Statistical Software Design by Norman Matloff

The official reference is this: http://cran.r-project.org/doc/manuals/r-release/R-intro.html

# START of SCRIPT

# Exploring Data Structures

## objects are made up of various types.

## I want to discuss objects that contain data

## Data goes into objects

### Use the assignment function "<-"

### Protein Concentrations

prot <- c(0.000, 0.016, 0.031, 0.063, 0.125, 0.250, 0.500, 1.000,

0.000, 0.016, 0.031, 0.063, 0.125, 0.250, 0.500, 1.000)

### Absorbance from my protein assay

abs <- c(0.329, 0.352, 0.349, 0.379, 0.417, 0.491, 0.668, 0.956,

0.327, 0.341, 0.355, 0.383, 0.417, 0.446, 0.655, 0.905)

### these appear in the R-Studio environment as Values

## These objects are vectors - all the data elements must be the same type

### A vector is the simplist type of object

### can be numeric, character, logical, factors

class(prot) #### numeric

### Some other types of vectors

protein <- "albumin"

class(protein) #### character

truth <- c(TRUE, FALSE, TRUE, TRUE)

class(truth) #### logical

### you can identify things inside the objects

prot[2]

### and parts of objects

prot[1:8]

### functions can be applied to whole objects (particularly arrays)

### the plot function puts the first element of each object against each other

plot(abs~prot)

# More Complicated structures

## <b>lists</b> are another type of object

## the lm() function makes an object called line which is a list.

## lists contain a mixture of data types.

line <- lm(abs~prot)

### the R-Studio environment says a "List of 12"

## there are various ways of getting information from this object

## type the name of the object

line

## use the summary() function

summary(line)

## use the $

summary(line)$r.squared

### we used this to extract the r2

### we created the object r2 using the function summary()

r2 <- summary(line)$r.squared

### and the function round() - gives us three decimal points

r2 <- round(summary(line)$r.squared, 3)

r2

class(r2)

### from the list we have extracted a number.

# <b>matrices</b> are two dimensional structures

## the data types are all the same

# <b>data frames</b> are two dimensional structures

## contains different types of data

# often when we import data, it gets imported as a data frame.

## here is an example:

data <- read.csv("http://science2therapy.com/data/wellsDataSimp.csv")

## the R-Studio environment puts it in "data" and gives us some info

## Have a quick look at it

View(data) # works in R-Studio

str(data)

## we have names of columns and we have the class of the data within the column

## note: Factors, num, int

data$Virus

# Simple plot from this data frame

plot(data[5:7])

## Another plot from this data frame

plot(data$P.Erk, data$S.phase.cnt)

## we can manipulate objects including data frames

## which is the subject of the next tutorial.

## No comments:

## Post a Comment

Comments and suggestions are welcome.