<- 3; x > 2 & < 9 x
Lecture 01 - Solutions
Exercise A - (5 min)
- Why does this code throw an error? Try to fix it.
- Does
(NA & TRUE)
equal(NA | TRUE)
? Explain. - Does
(Inf - Inf)
equal(Inf - 1)
? Explain. - Run the following. What happens? (further reading)
<- (1 - 0.8); z <- 0.2
y == z; y < z; all.equal(y, z); identical(y, z) y
- Why do I use double quotes here?
<- "The harder you try, the more you'll learn." important_message
Solution
# Part 1
# Need to write x twice to get two complete statements
<- 3
x > 2) & (x < 9) (x
[1] TRUE
# Part 2
NA & TRUE # result unknown since AND is only TRUE if both are TRUE
[1] NA
NA | TRUE # since one condition is true, OR is true
[1] TRUE
# Part 3
Inf - Inf
[1] NaN
Inf - 1
[1] Inf
# Part 4
<- (1 - 0.8)
y <- 0.2
z == z y
[1] FALSE
< z y
[1] TRUE
all.equal(y, z)
[1] TRUE
identical(y, z)
[1] FALSE
# Part 5
# With single quote, apostrophe in "you'll" would cause problems.
Exercise B - (1 minute)
Predict the result that you will obtain if you use typeof()
to find the type of each of the following atomic vectors. Then check to see if you were right!
<- c('1', '2', '3')
foo <- c('TRUE', 'FALSE') bar
Solution
# They're both character vectors
typeof(foo)
[1] "character"
typeof(bar)
[1] "character"
Exercise C - (5 minutes)
<- c('Keble', 'LMH', 'Univ', 'Merton') y
- Enter the command
y[5]
. What result do you get? Why? - I want to extract the second and fourth elements of
y
so I entery[2,4]
. What happens? Can you fix it? How? - Select
'Keble'
and'Univ'
two different ways. - Below is a vector of sales in $ over several months. Using
[]
,length()
and-
, compute the monthly growth rates in %
<- c(100, 120, 90, 110, 105, 130, 140, 135, 125, 145, 150, 160) sales
Solution
<- c('Keble', 'LMH', 'Univ', 'Merton')
y
# Part 1
# We get an NA since there is no 5th element
5] y[
[1] NA
# Part 2
# Need to enclose 2,4 within c()
c(2, 4)] y[
[1] "LMH" "Merton"
# Part 3
c(1, 3)] y[
[1] "Keble" "Univ"
-c(2, 4)] y[
[1] "Keble" "Univ"
# Part 4
100 * ((sales[-1] / sales[-length(sales)]) - 1)
[1] 20.000000 -25.000000 22.222222 -4.545455 23.809524 7.692308
[7] -3.571429 -7.407407 16.000000 3.448276 6.666667
Exercise D - (3 min)
The probability mass function of a Binomial\((n, p)\) random variable is given by \[
\mathbb{P}(X=x) = \binom{n}{x} p^x (1 - p)^{n-x}
\] Use vectorized mathematical operations and the choose()
function to calculate the pmf of a Binomial\((5, 0.3)\) random variable in one fell swoop.
Solution
<- 5
n <- 0.3
p <- 0:n
x <- choose(n, x) * p^x * (1 - p)^(n - x)
pmf pmf
[1] 0.16807 0.36015 0.30870 0.13230 0.02835 0.00243
# Check that our calculations agree with dbinom()
all.equal(dbinom(x, n, p), pmf)
[1] TRUE
Exercise E - (5 min)
- Replace all of the
999
s in this vector withNA
s
<- c(5, 10, 3, 7, 999, 2, 999, 17, 0) x
- In a deck of Italian playing cards, the face cards are fante (Knave), cavallo (Knight), and re (King). In the game Scopa, fante is worth 8, cavallo 9, and re 10. Convert
cards
to the appropriate numeric values.
<- c('re', 'cavallo', 're', 'fante', 'cavallo', 'fante', 're') cards
- This code throws an error. Coerce
y
to make it work.
<- c('1', '2', '3')
y sum(y)
- What happens if you run
as.logical(-2:2)
? Can you figure out the coercion rule for numeric to logical?
Solution
# Part 1
== 999] <- NA
x[x x
[1] 5 10 3 7 NA 2 NA 17 0
# Part 2
# The slickest solution uses a lookup table:
<- c('fante' = 8, 'cavallo' = 9, 're' = 10)
lookup <- c('re', 'cavallo', 're', 'fante', 'cavallo', 'fante', 're')
cards lookup[cards]
re cavallo re fante cavallo fante re
10 9 10 8 9 8 10
# Part 3
<- c('1', '2', '3')
y sum(as.numeric(y))
[1] 6
# Part 4
# Every element becomes TRUE except for 0, which becomes FALSE
as.logical(-2:2)
[1] TRUE TRUE FALSE TRUE TRUE
Exercise F - (10 min)
- Call
z_score(w)
wherew <- c(1, 2, NA)
. What happens? See?mean()
. - Test out this function. What happens? Now try adding
return(z)
at the bottom of the function body. Explain your results.
<- function(x) {
bad_z_score <- (x - mean(x)) / sd(x)
z }
- Write a function to compute skewness using
sum()
,length()
,mean()
andsd()
. \[ \text{Skewness} \equiv \frac{1}{n} \sum_{i=1}^n\left( \frac{x_i - \bar{x}}{s}\right)^3. \] - Use
sum()
,length()
andis.na()
to write a function calledmy_var()
that dropsNA
s and then computes the sample variance. - Write a function called
summary_stats()
that returns a named vector with two elements: the sample mean and standard deviation.
Solution
# Part 1
# Part 2
# The final statement in this function *stores* the result so it doesn't return
# anything. Either drop the assignment or add return()
# Part 3
<- function(x) {
skewness mean(((x - mean(x)) / sd(x))^3)
}
# Part 4
<- function(x) {
my_var <- x[!is.na(x)]
x <- length(x)
n sum((x - mean(x))^2) / (n - 1)
}
# Part 5
<- function(x) {
summary_stats c('mean' = mean(x), 'sd' = sd(x))
}
Exercise G - (8 min)
- What happens if you run the following code? Why?
<- c(TRUE, TRUE)
x if(x) {
print('hello world!')
}
- What happens if you run this code? Try to fix it.
if(3 > 5) {
print('3 is greater than 5')
}else {
print('3 is not greater than 5')
}
- Write a function called
mycov()
that calculates the sample covariance betweenx
andy
. Use an early return to print an error message whenx
andy
have different lengths. - Consult
?trunc()
. Then usetrunc()
to write a function calledmyround()
that roundsx
to the nearest integer.
Solution
# Part 1
# This code fails: the condition inside of if() must evaluate to
# a *single* logical value, but this is a vector.
# Part 2
# The problem is the line break before else. This runs:
if(3 > 5) {
print('3 is greater than 5')
else {
} print('3 is not greater than 5')
}
[1] "3 is not greater than 5"
# Part 3
<- function(x, y) {
mycov if(!identical(length(x), length(y))) {
return('Error: x and y must have the same length')
}- mean(x)) * (y - mean(y))
(x
}
# Part 4
<- function(x) {
myround <- trunc(x)
integer_part <- x - integer_part
decimal_part if(decimal_part <= 0.5) {
<- integer_part
out else {
} <- integer_part + 1
out
}
out }
Exercise H - (8 min)
- The Fibonacci Sequence is defined by \(F_1 = 1\), \(F_2 = 1\) and \(F_n = F_{n-1} + F_{n-2}\) for \(n > 2\). Write a function that uses a
for()
loop to compute firstn
Fibonacci numbers. - Come up with a way to generate the same output as
f()
without using a loop orif() ... else
.
<- \(x) {
f for(j in 1:length(x)) {
if(x[j] > 0) {
<- x[j]^3 + x[j]
x[j] else {
} <- x[j]^2 - x[j]
x[j]
}
}
x }
Solution
# Part 1
<- function(n) {
fib <- vector(length = n)
out 2] <- out[1] <- 1
out[for(i in 3:n) {
<- out[i - 1] + out[i - 2]
out[i]
}
out
}fib(12)
[1] 1 1 2 3 5 8 13 21 34 55 89 144
# Part 2
<- function(x) {
g > 0) * (x^3 + x) + (x <= 0) * (x^2 - x)
(x
}f(-2:2)
[1] 6 2 0 2 10
g(-2:2)
[1] 6 2 0 2 10
Exercise I - (8 min)
- Create a \(5\times 5\) matrix called
A
, each of whose rows contains the elements1:5
. Hint: see?rep
. - Display all elements of
A
except row 3 and column 2. - Form a matrix
B
by stacking the \((4\times 4)\) identity matrix on top of itself. - Display the seventh row of
B
. - Write a function that uses a
for()
loop to construct the \((n\times n)\) exchange matrix \(J_n\).
Solution
# Part 1
<- matrix(rep(1:5, times = 5), 5, 5, TRUE)
A
# Part 2
-3, -2] A[
[,1] [,2] [,3] [,4]
[1,] 1 3 4 5
[2,] 1 3 4 5
[3,] 1 3 4 5
[4,] 1 3 4 5
# Part 3
<- rbind(diag(nrow = 4), diag(nrow = 4))
B
# Part 4
7, ] B[
[1] 0 0 1 0
# Part 5
<- function(n) {
get_exchange <- matrix(0, n, n)
out for(i in 1:n) {
+ 1 - i] <- 1
out[i, n
}
out }
Exercise J - (8 min)
- Write a function to constructs the \((n\times n)\) exchange matrix \(J_n\) without using a loop.
- Compute the element-wise product of \(J_3\) with itself, and the square of \(J_3\), i.e. the ordinary matrix product \(J_3 J_3\).
- Let \(X\) be a Bernoulli\((0.2)\) and \(Y\) be a Binomial\((2, 0.5)\) RV. Construct a matrix
p_XY
that represents the joint pmf of \(X\) and \(Y\), under the assumption that \(X\) and \(Y\) are independent. Name the rows and columns. - Consult
?rowSums()
and?colSums()
. Then extract the marginal pmfs of \(X\) and \(Y\) from the matrixp_XY
.
Solution
# Part 1
<- function(n) {
get_exchange <- matrix(0, n, n)
out <- cbind(1:n, n:1)
anti_diagonal <- 1
out[anti_diagonal]
out
}
# An even slicker solution to part 1, suggested by a student:
<- function(n) {
get_exchange2 diag(1, n)[n:1, ]
}
# Part 2
<- get_exchange(3)
J3 * J3 J3
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 0 1 0
[3,] 1 0 0
%*% J3 J3
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
# Part 3
<- c(0.2, 0.8) %o% c(0.25, 0.5, 0.25)
p_XY rownames(p_XY) <- c('x=0', 'x=1')
colnames(p_XY) <- c('y=0', 'y=1', 'y=2')
# Part 4
rowSums(p_XY)
x=0 x=1
0.2 0.8
colSums(p_XY)
y=0 y=1 y=2
0.25 0.50 0.25
Exercise K - (7 min)
I used
students$name == 'Xerxes'
above. Why didn’t I instead useidentical(students$name, 'Xerxes')
?Use the following code chunk to construct the
employees
data frame. Then display it.
<- data.frame(
employees name = c("Alice", "Bob", "Cathy", "David", "Eva",
"Frank", "Grace", "Hank", "Ivy", "Jack"),
age = c(25, 31, 28, 40, 35, 23, 30, 45, 33, 29),
department = c("HR", "IT", "Finance", "IT", "HR",
"Finance", "IT", "HR", "Finance", "IT"),
salary = c(50000, 60000, 55000, 70000, 53000,
51000, 62000, 71000, 57000, 59000)
)
- Display the
age
column ofemployees
. - Display the sixth row of
employees
. - Display the employee record for
Eva
. - Display employee records for everyone in the
IT
department. - Repeat the preceding, restricted to people with a salary of at least 60,000.
Solution
# Part 1
<- data.frame('name' = c('Xerxes', 'Xanthippe', 'Xanadu'),
students 'age' = c(19, 23, 21),
'grade' = c(65, 70, 68),
'favorite_color' = c('blue', 'red', 'orange'))
# identical() returns a *scalar* but we need a vector
identical(students$name, 'Xerxes'), ] students[
[1] name age grade favorite_color
<0 rows> (or 0-length row.names)
# Part 2
employees
name age department salary
1 Alice 25 HR 50000
2 Bob 31 IT 60000
3 Cathy 28 Finance 55000
4 David 40 IT 70000
5 Eva 35 HR 53000
6 Frank 23 Finance 51000
7 Grace 30 IT 62000
8 Hank 45 HR 71000
9 Ivy 33 Finance 57000
10 Jack 29 IT 59000
# Part 3
$age employees
[1] 25 31 28 40 35 23 30 45 33 29
# Part 4
6, ] employees[
name age department salary
6 Frank 23 Finance 51000
# Part 5
$name == 'Eva', ] employees[employees
name age department salary
5 Eva 35 HR 53000
# Part 6
<- employees$department == 'IT'
is_IT employees[is_IT, ]
name age department salary
2 Bob 31 IT 60000
4 David 40 IT 70000
7 Grace 30 IT 62000
10 Jack 29 IT 59000
# Part 7
<- employees$salary >= 60000
high_salary & high_salary, ] employees[is_IT
name age department salary
2 Bob 31 IT 60000
4 David 40 IT 70000
7 Grace 30 IT 62000