In class we defined the Binomial\((n,p)\) random variable as the sum of \(n\) independent Bernoulli\((p)\) random variables. In other words, the Binomial\((n,p)\) equals the total number of successes (ones) in \(n\) independent Bernoulli trials, each with probability of success (one) equal to \(p\). The point of this document is to convince you that this definition actually makes sense and really does lead to the formulas from class.

R doesn’t have a built-in function to simulate Bernoulli Random Variables since it treats them as a Binomial\((n=1,p)\) random variable. Let’s make our own:

```
#This function simulates n independent, identically distributed Bernoulli Random Variables
rbern <- function(n, p){
sims <- sample(0:1, size = n, replace = TRUE, prob = c(1-p, p))
return(sims)
}
```

The argument `prob = c(1-p, p)`

is new. It tells `sample`

to draw `0`

with probability `1-p`

and `1`

with probability `p`

. This is exactly what we want. Let’s test it out:

`rbern(30, 0.1)`

`## [1] 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1`

`rbern(30, 0.5)`

`## [1] 0 1 0 0 0 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 1 0 0 0 0 1 0 0 0`

`rbern(30, 0.9)`

`## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1`

The function `rbern`

returns all the *individual* Bernoulli draws. What we need to make a connection with the Binomial RV is a function that returns the *sum* of these draws instead. This is easy: we just call `rbern`

and take its sum:

```
#This function simulates n independent, identically distributed Bernoulli Random Variables and returns their sum
rbern.sum <- function(n, p){
sims <- rbern(n, p)
return(sum(sims))
}
```

Let’s test it out:

`rbern.sum(50, 0.1)`

`## [1] 8`

`rbern.sum(50, 0.5)`

`## [1] 32`

`rbern.sum(50, 0.9)`

`## [1] 45`

It turns out that our function `rbern.sum`

makes a *single* random draw from a Binomial\((n,p)\) random variable. How do I know this? We constructed it exactly following the definition of the Binomial from class: draw the some Bernoullis and sum them up. But don’t take my word for it. Let’s verify this with a simulation.

To do this, we’ll use `rbern.sum`

and `replicate`

```
binom.sims <- replicate(10^5, rbern.sum(10, 0.5))
head(binom.sims)
```

`## [1] 5 4 7 3 4 2`

Let’s make a plot of the relative frequencies from this simulation experiment:

```
p.sim <- table(binom.sims)/10^5
p.sim
```

```
## binom.sims
## 0 1 2 3 4 5 6 7 8
## 0.00088 0.00925 0.04361 0.11954 0.20616 0.24529 0.20741 0.11301 0.04463
## 9 10
## 0.00943 0.00079
```

`plot(p.sim, type = 'h')`

What we’ve done here is approximate a pmf using Monte Carlo simulation. Now let’s compare it to the *actual* pmf of a Binomial RV.

We know how to calculate this using what we covered in R Tutorial #5

```
support <- 0:10
p.true <- dbinom(support, size = 10, prob = 0.5)
rbind(support, p.true)
```

```
## [,1] [,2] [,3] [,4] [,5] [,6]
## support 0.0000000000 1.000000000 2.00000000 3.0000000 4.0000000 5.0000000
## p.true 0.0009765625 0.009765625 0.04394531 0.1171875 0.2050781 0.2460938
## [,7] [,8] [,9] [,10] [,11]
## support 6.0000000 7.0000000 8.00000000 9.000000000 1.000000e+01
## p.true 0.2050781 0.1171875 0.04394531 0.009765625 9.765625e-04
```

`plot(support, p.true, type = 'h')`

The plot of the actual pmf and the associated probabilities match those from the simulation almost perfectly:

`p.sim - p.true`

```
## binom.sims
## 0 1 2 3 4
## -0.0000965625 -0.0005156250 -0.0003353125 0.0023525000 0.0010818750
## 5 6 7 8 9
## -0.0008037500 0.0023318750 -0.0041775000 0.0006846875 -0.0003356250
## 10
## -0.0001865625
```

As you can see, the differences are *tiny*. They’d be even smaller if we used more replications.

**I hope this example has given you a bit more intuition about the Binomial RV. I’m not going to assign any exercises here, but it would be a good idea to try some simulation experiments of your own with different values for \(n\) and \(p\).**