Monte Carlo Simulation Basics in R

Francis J. DiTraglia

University of Oxford

Monte Carlo Simulation: What & Why?

Use a computer to repeatedly carry out a random experiment.
Keep track of the outcomes; compute relative frequencies.
Make plots / tables of results.

In a simulation, we know the truth: check how estimators, confidence intervals, tests perform in finite samples.
Check probability calculations by simulating; approximate probabilities that can’t be computed analytically.

Some References on Simulation

To earn more about what’s going on “under the hood”

Ross (2010 or later) A First Course in Probability, Chapter 10
Devroye (1986) Non-Uniform Random Variate Generation
Ross (2020) Simulation

But what is randomness, Socrates?

This is a very challenging philosophical problem!
Monte Carlo: “frequency-type” definition of probability (i.e. long-run relative frequency)
- An Introduction to Probability and Inductive Logic by Ian Hacking (especially Chapter 11)
- Chance versus Randomness and Interpretations of Probability from the Stanford Encyclopedia of Philosophy
- Probability Theory: the Logic of Science by E.T. Jaynes (especially Chapter 10)

Pseudorandom Numbers

R cannot generate “truly” random numbers
Computers are deterministic: same input, same output
“Random” draws on a computer are really pseudorandom
- Intuitively: “as if random”
- Generated by a deterministic algorithm
- Statistically random: indistinguishable from real random numbers using statistical tests
Pseudorandom is good enough for our purposes
I’ll write RNG for “pseudo-random number generator” below

Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.¹

A Rough Idea of How it Works

Find a way to make independent Uniform(0,1) draws.
Find a way to transform uniforms into draws from the desired probability distribution.

The Mersenne Twister is fairly complicated.
Want to give you some intuition for pseudorandom draws.
Next slide: Lehmer random number generator.
Later: example of transforming uniform draws.

Lehmer Random Number Generator

Choose integers \(m, a\) where \(m > 0\) and \(0 < a < m\).
Choose integer seed \(x_0\) where \(0 < x_0 < m\). (starting value)
Construct \(\{x_n\}\) according to: \(\boxed{x_{n+1} = a x_n \mod m}\).
- i.e. \(x_{n+1}=\) remainder when \(ax_n\) is divided by \(m\).
\(\{x_n/m\} \approx\) sequence of iid Uniform(0,1) draws.

Note

Completely deterministic: same seed and \(m, a\) parameters \(\implies\) same draws!

Lehmer RNG: Example

You will expand / improve this example on the problem set.

# Parameters - not great choices: this is only an example
m <- 2^16 + 1 # positive integer, here: 65537
a <- 75 # positive integer betweeen 0 and m

# Seed
x0 <- 1983 # between 0 and m

# x %% y is the remainder when x is divided by y
x1 <- (a * x0) %% m
x1

[1] 17651

x2 <- (a * x1) %% m
x3 <- (a * x2) %% m
x4 <- (a * x3) %% m
x5 <- (a * x4) %% m

c(x1, x2, x3, x4, x5) / m

[1] 0.26932878 0.19965821 0.97436563 0.07742191 0.80664358

`sample()` - Draw from a Vector in R

sample(x, size, replace = FALSE, prob = NULL)

Generate a random sample of length size from the vector x
If x is a positive integer, samples are drawn from 1:x
By default, draws are made without replacement
To draw with replacement set replace = TRUE
By default, elements of x are sampled with equal probability
To sample with different probabilities, supply them as prob

`sample()` - Examples¹

marbles <- c('red', 'blue', 'green') # a bowl with three marbles
sample(marbles, size = 2) # draw two marbles without replacement

[1] "blue" "red"

sample(10, size = 5) # five draws without replacement from 1:10

[1] 7 2 5 6 4

# Sample with replacement
sample(c('Core', 'ERM', 'Rules'), 10, replace = TRUE)

 [1] "ERM"   "Rules" "Rules" "Rules" "ERM"   "Rules" "Rules" "ERM"   "Core" 
[10] "Rules"

# Unequal probabilities: P(FALSE) = 0.7
sample(c(TRUE, FALSE), 20, replace = TRUE, prob = c(0.3, 0.7))

 [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE
[13] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE

🌱 Setting the Seed in R

set.seed(YOUR_SEED_GOES_HERE)

Remember: seed of an RNG is its initial condition and
RNG is deterministic, so same seed gives same draws.
Supply a single numeric value (interpreted as an integer)

set.seed(1983)
sample(100, 2)

[1] 67 18

sample(100, 2)

[1] 77 94

set.seed(1983)
sample(100, 2)

[1] 67 18

🌱 Why and when to set the seed?

Set the seed once at the beginning of your simulation study.
Ensures replicability:
- You will get the same results if you re-run your code
- I will get the same results as you if I test your code
Pick a number that you like or obtain a real random seed from random.org!

💪 Exercise A - (4 min)

What happens if I don’t supply size to sample()?
Does it matter if I set replace = TRUE or FALSE here?

sample(1:6, size = 1)

Write a function called rbern() that uses sample() to make a sequence of \(n\) iid Bernoulli\((p)\) draws. Its arguments are n, the number of draws, and p, the probability of success. It should return a vector n of zeros and ones. Check your results by making a large number of draws and verifying that the fraction of ones is approximately \(p\).

Mind your Ps and Qs (and Ds and Rs).

Base R has functions for many common RVs
Each RV has an abbreviation: e.g. norm for normal
Each RV has four related functions:
- d = density if continuous, mass function if discrete
- p = CDF
- q = quantile function (inverse CDF)
- r makes random draws
E.g. dnorm(), pnorm(), qnorm(), rnorm()

Built-in Random Variables in R

R commands	Distribution
`d/p/q/rbeta`	Beta
`d/p/q/rbinom`	Binomial
`d/p/q/rcauchy`	Cauchy
`d/p/q/rchisq`	Chi-Squared
`d/p/q/rexp`	Exponential
`d/p/q/rf`	F
`d/p/q/rgamma`	Gamma
`d/p/q/rgeom`	Geometric
`d/q/p/rhyper`	Hypergeometric

R commands	Distribution
`d/p/q/rlogis`	Logistic
`d/p/q/rlnorm`	Log Normal
`d/p/q/rnbinom`	Negative Binomial
`d/p/q/rnorm`	Normal
`d/p/q/rpois`	Poisson
`d/p/q/rt`	Student’s t
`d/p/q/runif`	Uniform
`d/p/q/rweibull`	Weibull

Example: Standard Normal Density

Warning

R parameterizes the normal distribution in terms of its mean and standard deviation.

library(tidyverse)
mu <- 0
sigma <- 1

normal_density <- tibble(x = seq(from = -4, to = 4, by = 0.01)) |> 
  mutate(density = dnorm(x, mu, sigma))

normal_density |> 
  ggplot(aes(x, density)) +
  geom_line()

Example: Standard Normal Density

Example: Standard Normal CDF

Warning

R parameterizes the normal distribution in terms of its mean and standard deviation.

mu <- 0
sigma <- 1

normal_cdf <- tibble(x = seq(from = -4, to = 4, by = 0.01)) |> 
  mutate(cdf = pnorm(x, mu, sigma))

normal_cdf |> 
  ggplot(aes(x, cdf)) +
  geom_line()

Example: Standard Normal CDF

Example: Standard Normal Draws

set.seed(1234)
normal_sims <- rnorm(500, mu, sigma)
tibble(normal_sims) |> 
  ggplot(aes(x = normal_sims)) +
  geom_histogram(bins = 20)

Example: Binomial(10, 0.2) pmf

n <- 10
p <- 0.2

binom_pmf <- tibble(x = 0:n) |>  
  mutate(p = dbinom(x, n, p))

binom_pmf |> 
  ggplot(aes(x = x, y = p)) +
  geom_point() + 
  geom_segment(aes(xend = x, yend = 0)) + 
  labs(title = 'Binomial(10, 0.2) pmf', y = 'p(x)')

Example: Binomial(10, 0.2) pmf

Example: Binomial(10, 0.2) CDF

n <- 10
p <- 0.2

binom_cdf <- tibble(x = seq(0, n + 1, 0.01)) |>  
  mutate(cdf = pbinom(x, n, p))

binom_cdf |> 
  ggplot(aes(x = x, y = cdf)) +
  geom_line() +
  labs(title = 'Binomial(10, 0.2) CDF', y = 'F(x)')

Example: Binomial(10, 0.2) CDF

Example: Binomial(10, 0.2) Draws

set.seed(1693)
binom_sims <- rbinom(500, n, p)

tibble(binom_sims) |> 
  ggplot(aes(x = binom_sims)) +
  geom_bar()

💪 Exercise B - (10 min)

Consult ?dexp() and then carry out the following:
1. Plot the density and CDF of an Exponential(1) RV.
2. Plot a histogram of 500 draws from an Exponential(1) distribution.
Consult ?dpois() and then carry out the following:
1. Plot the mass function and CDF of a Poisson(5) RV.
2. Plot a barchart of 500 draws from a Poisson(5) distribution.
Suppose that we observe a sample of \(n=16\) iid normal RVs with unknown mean \(\mu\) and known variance \(\sigma^2 = 1\).
1. A confidence interval for \(\mu\) takes the form \(\bar{X}\pm \text{ME}\) where \(\text{ME}\) is the margin of error. Use qnorm() to compute \(\text{ME}\) for an 89% interval.
2. Simulate 16 draws from a \(N(0,\sigma=4)\) distribution and construct the interval from part (a). Does it cover the true value of \(\mu\)?

The Inverse Transformation Method

Already know how make uniform draws.
Simulate arbitrary RV with CDF \(F\) and quantile function \(Q\).
Inverse Transformation Method
1. Generate \(U \sim \text{Uniform}(0,1)\).
2. Set \(Y \equiv Q(U) \implies \mathbb{P}(Y\leq y) = F(y)\)
\(F\) continuous & strictly increasing \(\implies Q = F^{-1}\).
Otherwise, \(Q\) is the generalized inverse of \(F\).

Example of Inverse Transformation

\(X \sim \text{Exponential}(1)\)

The CDF of this random variable is \(F(x) = 1 - e^{-x}\). Re-arranging and solving for \(x\) \[ \begin{align*} p &= 1 - e^{-x}\\ x &= -\log(1 - p) \end{align*} \] so \(F^{-1}(p) = -\log(1 - p)\) is the quantile function.

u <- runif(1e5)
exp1_sims <- -log(1 - u)
tibble(x = exp1_sims) |> 
  ggplot(aes(x)) +
  geom_histogram()

Example of Inverse Transformation

Proof: Inverse Transformation

\(U\sim \text{Uniform}(0, 1) \implies Y \equiv F^{-1}(U)\) has CDF \(F\)

\[ \begin{align*} \mathbb{P}(Y \leq y) &= \mathbb{P}\left[ F^{-1}(U) \leq y \right]\\ &= \mathbb{P}\left[ F\left( F^{-1}\left(U \right) \right) \leq F(y) \right] \\ &= \mathbb{P}\left[ U \leq F(y) \right]\\ &= F(y) \end{align*} \]

\(F\) continuous & strictly increasing \(\Rightarrow\) has inverse \(F^{-1}\).
\(F\) is strictly increasing \(\Rightarrow x_1 \leq x_2 \iff F(x_1) \leq F(x_2)\).
\(F\left( F^{-1}\left(x \right)\right) = x\)
\(U \sim \text{Uniform}(0,1) \implies \mathbb{P}(U \leq x) = x\)

Quantile-Quantile (QQ) Plots

Compare theoretical vs sample quantiles: do these draws really come from that distribution?
- Set \(x = Q_\text{Theoretical}(p)\) and \(y = Q_\text{Sample}(p)\)
- Plot \(x\) against \(y\) for a range of probabilities \(p\).
Interpretation
- \(y = x \Rightarrow\) theoretical & sample quantiles agree.
- \(y = mx + b\) theoretical & sample quantiles agree after centering and scaling.
More details of QQ plots in this blog post.

Example - t Distribution (df = 3)

This distribution has heavy tails relative to a Normal

t3_sims <- rt(500, df = 3)
tibble(x = t3_sims) |> 
  ggplot(aes(sample = x)) + # source of data for *sample* quantiles 
  geom_qq()  # make the QQ plot: theoretical defaults to Normal(0,1)

Example - t Distribution (df = 3)

Sample and theoretical agree when we use right distribution!

tibble(x = t3_sims) |> 
  ggplot(aes(sample = x)) +
  geom_qq(distribution = qt, dparams = list(df = 3))

Plot a line through 1st & 3rd Quartiles

tibble(x = t3_sims) |> 
  ggplot(aes(sample = x)) +
  geom_qq(distribution = qt, dparams = list(df = 3)) + 
  geom_qq_line(col = 'red', distribution = qt, dparams = list(df = 3))

Example: Exponential(1) Simulation

tibble(x = exp1_sims) |> 
  ggplot(aes(sample = x)) +
  geom_qq(distribution = qexp, dparams = list(rate = 1)) + 
  geom_qq_line(col = 'red', distribution = qexp, 
               dparams = list(rate = 1))

💪 Exercise C - (10 min)

Let \(X\) be a random variable with the following density function \[ f(x) = \left\{ \begin{array}{ll} x/2, & x \in [0, 2]\\ 0,& \text{otherwise.} \end{array} \right. \] Write code that uses the inverse transformation method to simulate iid draws from this distribution. Test to make sure that your code works correctly.

A Very Simple Monte Carlo Simulation

Compute probabilities for sum for two fair, six-sided dice.
Here we know the answer, so we can easily check our code.
General Recipe:
1. Write a function to carry out the experiment once.
2. Use iteration to call that function a large number of times.
3. Store and summarize the results; set seed for replicability.

Note

The general recipe can always be applied, but sometimes there’s a faster approach.

1. Write a function to carry out the experiment once. 🎲 🎲

dice_sum <- \() {
# Roll a pair of fair, six-sided dice and return their sum
  die1 <- sample(1:6, 1)
  die2 <- sample(1:6, 1)
  die1 + die2
}

dice_sum()

[1] 9

dice_sum()

[1] 4

dice_sum()

[1] 5

2. Use iteration to call that function a large number of times.

We could use a for() loop, but there’s a better way.
Functional: a function that takes a another function as an input and returns a vector.
Functionals can be used to eliminate loops, making code more transparent.

Functional Programming with `purrr`

We’ll use purrr for functional programming
You’ve already installed it: library(tidyverse)
purrr References:
Today: simple application of the map() function
More interesting uses of purrr in future lectures.

Finishing the dice simulation

Use iteration to call that function a large number of times.

library(tidyverse)
nreps <- 1e4
sims <- map_dbl(1:nreps, \(i) dice_sum())

Summarize the results.

## What is the probability of rolling a 7?
mean(sims == 7)

[1] 0.1675

tibble(x = sims) |> 
  ggplot(aes(x)) + 
  geom_bar()

💪 Exercise D - (15 min)

This problem was initially posed by the famous 17th century gambler Antoine Gombaud, more commonly known as the Chevalier de Méré. Fermat and Pascal discussed its solution in their legendary correspondance that began the study of probability as we know it. Here’s the Chevalier’s question:

Which is more likely: (A) getting at least one six when rolling a single die four times or (B) getting at least one pair of sixes when rolling a pair of dice twenty-four times?

Answer the Chevalier’s question using Monte Carlo Simulation. Assume that all dice are fair and six-sided.

💪 Exercise E - (10 min)

Find an alternative, faster way to carry out the “sum of two fair dice” simulation from above. Hint, think about how you can avoid writing a function to carry out the simulation once and do the whole thing in a single step. Time your alternative approach to see how much faster it is.

Monte Carlo Simulation Basics in R

Monte Carlo Simulation: What & Why?

Some References on Simulation

But what is randomness, Socrates?

Pseudorandom Numbers

A Rough Idea of How it Works

Lehmer Random Number Generator

Lehmer RNG: Example

sample() - Draw from a Vector in R

sample() - Examples1

🌱 Setting the Seed in R

🌱 Why and when to set the seed?

💪 Exercise A - (4 min)

Mind your Ps and Qs (and Ds and Rs).

Built-in Random Variables in R

Example: Standard Normal Density

Example: Standard Normal Density

Example: Standard Normal CDF

Example: Standard Normal CDF

Example: Standard Normal Draws

Example: Binomial(10, 0.2) pmf

Example: Binomial(10, 0.2) pmf

Example: Binomial(10, 0.2) CDF

Example: Binomial(10, 0.2) CDF

Example: Binomial(10, 0.2) Draws

💪 Exercise B - (10 min)

The Inverse Transformation Method

Example of Inverse Transformation

Example of Inverse Transformation

Proof: Inverse Transformation

Quantile-Quantile (QQ) Plots

Example - t Distribution (df = 3)

Example - t Distribution (df = 3)

Plot a line through 1st & 3rd Quartiles

Example: Exponential(1) Simulation

💪 Exercise C - (10 min)

A Very Simple Monte Carlo Simulation

1. Write a function to carry out the experiment once. 🎲 🎲

2. Use iteration to call that function a large number of times.

Functional Programming with purrr

Finishing the dice simulation

💪 Exercise D - (15 min)

💪 Exercise E - (10 min)

`sample()` - Draw from a Vector in R

`sample()` - Examples¹

Functional Programming with `purrr`