[1] "Hello core ERM!"
A Crash Course in R Programming
University of Oxford
.
, and underscore _
.
_
snake_case
using _
to separate wordsx <- 3
creates an object named x
with the value 3
x
gets 3
”.=
) also works but it’s bad style.+
, -
, *
, /
and parentheses*
is mandatory^
and \(e^x\) with exp(x)
log(a, base = b)
, defaults to natural logx <- 3
L
, for example x <- 3L
x <- 'hello world!'
or y <- "ERM rocks!"
TRUE
or FALSE
typeof()
tells you the type!
turns a TRUE
into a FALSE
and vice-versax & y
is TRUE
iff both x
and y
are TRUE
x | y
is TRUE
iff at least one of x
and y
is TRUE
xor(x, y)
is TRUE
iff exactly one of x
and y
is TRUE
TRUE
or FALSE
<
, >
, <=
, and >=
all mean what you think they do==
tests for equality; don’t confuse it with =
!=
tests for lack of equality==
and !=
can be DangerousTRUE
/ FALSE
value use identical()
all.equal()
NA
s and Inf
s and NaN
sNA
means not available / missingInf
means infinity and -Inf
means minus infinityNaN
means not a numberNA
s, Inf
s, and NaN
s: Examples(NA & TRUE)
equal (NA | TRUE)
? Explain.(Inf - Inf)
equal (Inf - 1)
? Explain.c()
“concatenate” to create an atomic vectortypeof()
to find out the type of an atomic vectorlength()
to find the length of an atomic vectorPredict the result that you will obtain if you use typeof()
to find the type of each of the following atomic vectors. Then check to see if you were right!
[]
to access elements of an atomic vector.x[2]
or vector of them x[c(2, 5, 7)]
x
x[-1]
x[c(2, 2)]
Warning
R (like Julia) indexes from one, unlike Python and C/C++ which index from zero.
Did you notice the [1]
that keeps appearing everywhere?
w
is a vector; [1]
denotes its first (and only) element.
Here the first element [1]
is 30 and the 26th [26]
is 55:
y[5]
. What result do you get? Why?y
so I enter y[2,4]
. What happens? Can you fix it? How?'Keble'
and 'Univ'
two different ways.[]
, length()
and -
, compute the monthly growth rates in %Mathematical operations in R are vectorized and operate element by element:
Nearly all R Functions are vectorized: they accept vector input
Allows operations with vectors of different lengths, e.g. “scalars” with vectors:
numeric(0)
[1] 0
numeric(0)
Warning in c(1, 2, 3) + c(5, 6, 7, 8): longer object length is not a multiple
of shorter object length
[1] 6 8 10 9
The probability mass function of a Binomial\((n, p)\) random variable is given by \[
\mathbb{P}(X=x) = \binom{n}{x} p^x (1 - p)^{n-x}
\] Use vectorized mathematical operations and the choose()
function to calculate the pmf of a Binomial\((5, 0.3)\) random variable in one fell swoop.
<-
to overwriteMethod 1: create first, then use names()
birth year age #siblings
1983 40 1
Method 2: name when creating
birth year age #siblings
1983 40 1
You can rename with names()
:
NULL
is the empty set. You can assign it, e.g. x <- NULL
union(A, B)
\(\equiv A \cup B\)intersect(A, B)
\(\equiv A\cap B\)setdiff(A, B)
\(\equiv A \setminus B \equiv A - B \equiv A \cap B^{c}\)setequal(A, B)
is TRUE
iff \(A \subseteq B\) and \(B \subseteq A\)A %in% B
returns a vector of length(A)
with TRUE
for each element of A
that is contained in B
, FALSE
otherwiseNote
To coerce manually: as.character()
, as.numeric()
, as.logical()
999
s in this vector with NA
scards
to the appropriate numeric values.y
to make it work.as.logical(-2:2)
? Can you figure out the coercion rule for numeric to logical?Start with Hands-On Programming with R. For more:
A good rule of thumb is to consider writing a function whenever you’ve copied and pasted a block of code more than twice. – Hadley Wickham
scale()
to compute z-scores.z_score <-
z_score
.”function(x)
x
.”{ ... }
function()
and the linebreaks.return()
is bad style; reserve for “early returns”z_score()
functionz_score <- function(x) {
# Center and standardize a numeric vector x, returns z-scores
(x - mean(x)) / sd(x)
}
example_data <- c(-2, 6, 3, -1, 7, 8, 0, 4, 3, -5)
z <- z_score(example_data)
z
[1] -1.0195160 0.8772580 0.1659677 -0.7824193 1.1143547 1.3514514
[7] -0.5453225 0.4030645 0.1659677 -1.7308062
z_score <- function(x) { ... }
z_score(example_data)
\()
is Shorthand for function()
\()
shorthand for function()
[1] -1.0195160 0.8772580 0.1659677 -0.7824193 1.1143547 1.3514514
[7] -0.5453225 0.4030645 0.1659677 -1.7308062
The \(k\)th raw moment of a random variable is \(\mathbb{E}[X^k]\). The sample analogue is \(\frac{1}{n} \sum_{i=1}^n x_i^k\).
z_score(w)
where w <- c(1, 2, NA)
. What happens? See ?mean()
.return(z)
at the bottom of the function body. Explain your results.sum()
, length()
, mean()
and sd()
. \[
\text{Skewness} \equiv \frac{1}{n} \sum_{i=1}^n\left( \frac{x_i - \bar{x}}{s}\right)^3.
\]sum()
, length()
and is.na()
to write a function called my_var()
that drops NA
s and then computes the sample variance.summary_stats()
that returns a named vector with two elements: the sample mean and standard deviation....
k
is defined in the “global environment” so f()
“can see it”
m
is defined inside g()
so the global environment “can’t see it”
x <- 0.5
h <- \(x) {
sin(pi * x) # pi is a built-in constant in R
}
h(2) # Returns sin(2 * pi), not sin(pi * 0.5)
[1] -2.449294e-16
x
.h()
looks inside the function first and finds x
x
it stops looking.x
in h()
if ()
statementsIf LOGICAL_CONDITION
is TRUE
, run code inside { ... }
Examples:
if (3 > 5) {
print('Everything you know is wrong!')
}
my_name <- 'Frank'
if (identical(my_name, 'Frank')) {
print('Hi Frank!')
}
[1] "Hi Frank!"
Warning
LOGICAL_CONDITION
must be length one: an individual TRUE
of FALSE
value.
“break out” of function early: before completing everything
if ()...else
adds “default case”Examples:
if (3 > 5) {
print('Everything you know is wrong!')
} else {
print('The laws of mathematics continue to apply.')
}
[1] "The laws of mathematics continue to apply."
my_name <- 'Sam'
if (identical(my_name, 'Frank')) {
print('Hi Frank!')
} else {
print('You should change your name to Frank.')
}
[1] "You should change your name to Frank."
if ()...else if ()...else()
{...}
TRUE
conditionTRUE
, R skips the remaining blocksFALSE
, R runs else
block, if presentif ()
treeif ()
treesget_value2 <- function(x) {
values <- c(9, 5, 3, 3, 1)
names(values) <- c('queen', 'rook', 'knight', 'bishop', 'pawn')
values[x]
}
get_value('queen')
[1] 9
queen
9
Note
if ()
trees are best for running different code in each branch; lookup tables are best for assigning different values in each branch.
mycov()
that calculates the sample covariance between x
and y
. Use an early return to print an error message when x
and y
have different lengths.?trunc()
. Then use trunc()
to write a function called myround()
that rounds x
to the nearest integer.for ()
loopsBasic syntax:
Example:
for ()
loop detailsINDEX
INDEX
if it doesn’t exist; overwrites if it does.for()
loop detailsINDEX
created in environment where loop was calledfor ()
loop detailsfor ()
can iterate over any type of atomic vectorfor ()
loop stays in the for ()
loop.”Why doesn’t anything happen?!
Store the results somewhere to access later:
while ()
loopswhile ()
when you don’t know in advance how many iterations you’ll need.for ()
when you do know in advance how many iterations you’ll need.Generate a character vector of 1 million chess pieces:
Consider three methods to assign these pieces numeric values:
for ()
loop that repeatedly calls get_value()
and doesn’t pre-allocate any memory to store the result.for ()
loop that repeatedly calls get_value()
, but does pre-allocated memory to store the result.get_value2()
Note
Method 3 is simply get_value2()
so I don’t need a third function.
user system elapsed
0.849 0.034 0.884
user system elapsed
0.701 0.001 0.702
user system elapsed
0.008 0.000 0.009
[1] TRUE
for ()
loop to compute first n
Fibonacci numbers.f()
without using a loop or if () ... else
.attributes(x)
to view the attributes of x
attributes(x)
returns NULL
if x
has no attributesnames()
are an example of an attributedim()
[1] 1 2 3 4 5 6
[1] "integer"
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[1] "matrix" "array"
Warning
Some R functions / operations only work with matrices. A \((n\times 1)\) or \((1 \times n)\) matrix is not equivalent to an atomic vector. Remember: attributes and class.
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
matrix()
Set byrow = TRUE
in matrix()
ncol()
and nrow()
These tell us how many rows / columns a matrix has:
[,1] [,2] [,3]
[1,] "Queen" "Knight" "Pawn"
[2,] "Rook" "Bishop" "King"
[1] 2
[1] 3
This is the same information as dim()
x
, diag(x)
constructs a diagonal matrixM
, diag(M)
extracts the main diagonalk
, diag(nrow = k)
is the identity matrix \(I_k\)Same idea as vectors but two dimensions [row, col]
Empty means everything from this dimension
rbind()
and cbind()
Create / expand a matrix by binding rows or columns
A
, each of whose rows contains the elements 1:5
. Hint: see ?rep
.A
except row 3 and column 2.B
by stacking the \((4\times 4)\) identity matrix on top of itself.B
.for()
loop to construct the \((n\times n)\) exchange matrix \(J_n\).A failed attempt to produce the \((3\times 3)\) identity matrix:
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
Oops! We wrote over the entire matrix by mistake!
Instead: subset using a matrix of indices for the “target” matrix
Because a matrix is a vector with dimensions, +
, -
, *
, and /
are elementwise, just as they are for atomic vectors:
(More on matrix algebra in R in future lectures)
Mon Tues Weds
Aberdeen 5 2 2
Plymouth 12 7 8
Mon Tues Weds
5 2 2
Aberdeen Plymouth
2 8
Warning
There’s something funny about the second example: look closely!
drop()
deletes “extra” dimensionsp_XY
that represents the joint pmf of \(X\) and \(Y\), under the assumption that \(X\) and \(Y\) are independent. Name the rows and columns.?rowSums()
and ?colSums()
. Then extract the marginal pmfs of \(X\) and \(Y\) from the matrix p_XY
. list()
creates a list, just like c()
creates an atomic vector
[[1]]
[1] TRUE FALSE FALSE
[[2]]
[1] 3.141593
[[3]]
[,1] [,2]
[1,] 1 0
[2,] 0 1
str()
tells us what’s inside:
[]
[[]]
When creating a list, you can name the elements as with c()
Now we can access objects by name
$lecturer
[1] "Frank"
[1] "Frank"
$NAME_HERE
is a shortcut for [['NAME_HERE']]
A data frame is has type list
and class data.frame
We can mix-and-match selection rules for lists and matrices:
name age grade favorite_color
1 Xerxes 19 65 blue
2 Xanthippe 23 70 red
3 Xanadu 21 68 orange
[1] 19
[1] 19 23 21
[1] "blue" "red" "orange"
[1] "Xerxes" "Xanthippe" "Xanadu"
name age grade favorite_color
1 Xerxes 19 65 blue
I used students$name == 'Xerxes'
above. Why didn’t I instead use identical(students$name, 'Xerxes')
?
Use the following code chunk to construct the employees
data frame. Then display it.
employees <- data.frame(
name = c("Alice", "Bob", "Cathy", "David", "Eva",
"Frank", "Grace", "Hank", "Ivy", "Jack"),
age = c(25, 31, 28, 40, 35, 23, 30, 45, 33, 29),
department = c("HR", "IT", "Finance", "IT", "HR",
"Finance", "IT", "HR", "Finance", "IT"),
salary = c(50000, 60000, 55000, 70000, 53000,
51000, 62000, 71000, 57000, 59000)
)
age
column of employees
.employees
.Eva
.IT
department.
Comments
#
is a comment#
and then a space