In class we defined the Binomial$$(n,p)$$ random variable as the sum of $$n$$ independent Bernoulli$$(p)$$ random variables. In other words, the Binomial$$(n,p)$$ equals the total number of successes (ones) in $$n$$ independent Bernoulli trials, each with probability of success (one) equal to $$p$$. The point of this document is to convince you that this definition actually makes sense and really does lead to the formulas from class.

R doesn’t have a built-in function to simulate Bernoulli Random Variables since it treats them as a Binomial$$(n=1,p)$$ random variable. Let’s make our own:

#This function simulates n independent, identically distributed Bernoulli Random Variables
rbern <- function(n, p){

sims <- sample(0:1, size = n, replace = TRUE, prob = c(1-p, p))
return(sims)

}

The argument prob = c(1-p, p) is new. It tells sample to draw 0 with probability 1-p and 1 with probability p. This is exactly what we want. Let’s test it out:

rbern(30, 0.1)
##  [1] 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
rbern(30, 0.5)
##  [1] 0 1 0 0 0 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 1 0 0 0 0 1 0 0 0
rbern(30, 0.9)
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1

The function rbern returns all the individual Bernoulli draws. What we need to make a connection with the Binomial RV is a function that returns the sum of these draws instead. This is easy: we just call rbern and take its sum:

#This function simulates n independent, identically distributed Bernoulli Random Variables and returns their sum
rbern.sum <- function(n, p){

sims <- rbern(n, p)
return(sum(sims))

}

Let’s test it out:

rbern.sum(50, 0.1)
## [1] 8
rbern.sum(50, 0.5)
## [1] 32
rbern.sum(50, 0.9)
## [1] 45

## The Binomial RV

It turns out that our function rbern.sum makes a single random draw from a Binomial$$(n,p)$$ random variable. How do I know this? We constructed it exactly following the definition of the Binomial from class: draw the some Bernoullis and sum them up. But don’t take my word for it. Let’s verify this with a simulation.

### 100,000 Binomial$$(n = 10, p = 0.5)$$ Draws

To do this, we’ll use rbern.sum and replicate

binom.sims <- replicate(10^5, rbern.sum(10, 0.5))
head(binom.sims)
## [1] 5 4 7 3 4 2

Let’s make a plot of the relative frequencies from this simulation experiment:

p.sim <- table(binom.sims)/10^5
p.sim
## binom.sims
##       0       1       2       3       4       5       6       7       8
## 0.00088 0.00925 0.04361 0.11954 0.20616 0.24529 0.20741 0.11301 0.04463
##       9      10
## 0.00943 0.00079
plot(p.sim, type = 'h')

What we’ve done here is approximate a pmf using Monte Carlo simulation. Now let’s compare it to the actual pmf of a Binomial RV.

### The Exact pmf for a Bernoulli$$(n =10, p=0.5)$$

We know how to calculate this using what we covered in R Tutorial #5

support <- 0:10
p.true <- dbinom(support, size = 10, prob = 0.5)
rbind(support, p.true)
##                 [,1]        [,2]       [,3]      [,4]      [,5]      [,6]
## support 0.0000000000 1.000000000 2.00000000 3.0000000 4.0000000 5.0000000
## p.true  0.0009765625 0.009765625 0.04394531 0.1171875 0.2050781 0.2460938
##              [,7]      [,8]       [,9]       [,10]        [,11]
## support 6.0000000 7.0000000 8.00000000 9.000000000 1.000000e+01
## p.true  0.2050781 0.1171875 0.04394531 0.009765625 9.765625e-04
plot(support, p.true, type = 'h')

The plot of the actual pmf and the associated probabilities match those from the simulation almost perfectly:

p.sim - p.true
## binom.sims
##             0             1             2             3             4
## -0.0000965625 -0.0005156250 -0.0003353125  0.0023525000  0.0010818750
##             5             6             7             8             9
## -0.0008037500  0.0023318750 -0.0041775000  0.0006846875 -0.0003356250
##            10
## -0.0001865625

As you can see, the differences are tiny. They’d be even smaller if we used more replications.

I hope this example has given you a bit more intuition about the Binomial RV. I’m not going to assign any exercises here, but it would be a good idea to try some simulation experiments of your own with different values for $$n$$ and $$p$$.