In class we defined the Binomial\((n,p)\) random variable as the sum of \(n\) independent Bernoulli\((p)\) random variables. In other words, the Binomial\((n,p)\) equals the total number of successes (ones) in \(n\) independent Bernoulli trials, each with probability of success (one) equal to \(p\). The point of this document is to convince you that this definition actually makes sense and really does lead to the formulas from class.

R doesn’t have a built-in function to simulate Bernoulli Random Variables since it treats them as a Binomial\((n=1,p)\) random variable. Let’s make our own:

#This function simulates n independent, identically distributed Bernoulli Random Variables
rbern <- function(n, p){
  
  sims <- sample(0:1, size = n, replace = TRUE, prob = c(1-p, p))
  return(sims)
  
}

The argument prob = c(1-p, p) is new. It tells sample to draw 0 with probability 1-p and 1 with probability p. This is exactly what we want. Let’s test it out:

rbern(30, 0.1)
##  [1] 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
rbern(30, 0.5)
##  [1] 0 1 0 0 0 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 1 0 0 0 0 1 0 0 0
rbern(30, 0.9)
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1

The function rbern returns all the individual Bernoulli draws. What we need to make a connection with the Binomial RV is a function that returns the sum of these draws instead. This is easy: we just call rbern and take its sum:

#This function simulates n independent, identically distributed Bernoulli Random Variables and returns their sum
rbern.sum <- function(n, p){
  
  sims <- rbern(n, p)
  return(sum(sims))
  
}

Let’s test it out:

rbern.sum(50, 0.1)
## [1] 8
rbern.sum(50, 0.5)
## [1] 32
rbern.sum(50, 0.9)
## [1] 45

The Binomial RV

It turns out that our function rbern.sum makes a single random draw from a Binomial\((n,p)\) random variable. How do I know this? We constructed it exactly following the definition of the Binomial from class: draw the some Bernoullis and sum them up. But don’t take my word for it. Let’s verify this with a simulation.

100,000 Binomial\((n = 10, p = 0.5)\) Draws

To do this, we’ll use rbern.sum and replicate

binom.sims <- replicate(10^5, rbern.sum(10, 0.5))
head(binom.sims)
## [1] 5 4 7 3 4 2

Let’s make a plot of the relative frequencies from this simulation experiment:

p.sim <- table(binom.sims)/10^5
p.sim
## binom.sims
##       0       1       2       3       4       5       6       7       8 
## 0.00088 0.00925 0.04361 0.11954 0.20616 0.24529 0.20741 0.11301 0.04463 
##       9      10 
## 0.00943 0.00079
plot(p.sim, type = 'h')

What we’ve done here is approximate a pmf using Monte Carlo simulation. Now let’s compare it to the actual pmf of a Binomial RV.

The Exact pmf for a Bernoulli\((n =10, p=0.5)\)

We know how to calculate this using what we covered in R Tutorial #5

support <- 0:10
p.true <- dbinom(support, size = 10, prob = 0.5)
rbind(support, p.true)
##                 [,1]        [,2]       [,3]      [,4]      [,5]      [,6]
## support 0.0000000000 1.000000000 2.00000000 3.0000000 4.0000000 5.0000000
## p.true  0.0009765625 0.009765625 0.04394531 0.1171875 0.2050781 0.2460938
##              [,7]      [,8]       [,9]       [,10]        [,11]
## support 6.0000000 7.0000000 8.00000000 9.000000000 1.000000e+01
## p.true  0.2050781 0.1171875 0.04394531 0.009765625 9.765625e-04
plot(support, p.true, type = 'h')

The plot of the actual pmf and the associated probabilities match those from the simulation almost perfectly:

p.sim - p.true
## binom.sims
##             0             1             2             3             4 
## -0.0000965625 -0.0005156250 -0.0003353125  0.0023525000  0.0010818750 
##             5             6             7             8             9 
## -0.0008037500  0.0023318750 -0.0041775000  0.0006846875 -0.0003356250 
##            10 
## -0.0001865625

As you can see, the differences are tiny. They’d be even smaller if we used more replications.

I hope this example has given you a bit more intuition about the Binomial RV. I’m not going to assign any exercises here, but it would be a good idea to try some simulation experiments of your own with different values for \(n\) and \(p\).