In Lecture 13 I introducted three distributions that are derived from the normal: Student-t, \(\chi^2\) and \(F\). This document uses simulations to, I hope, make these relationships a little clearer. This is the key point all of these random variables arise from independent standard normals.
This is what you get if you take \(\nu\) independent standard normals, square them, and sum the result. The paramter \(\nu\) is called the “degrees of freedom.”
Let’s try creating some \(\chi^2(1)\) random draws the hard way rather than using rchisq
#Function to draw a single Chi-squared random variable with degrees of freedom equal to df
my.rchisq <- function(df){
#Draw df independent standard normals
normal.sims <- rnorm(df)
#Square them and sum the result
chi.sims <- sum(normal.sims^2)
}
sims <- replicate(10000, my.rchisq(1))
hist(sims, probability = TRUE)
x <- seq(from = 0, to = max(sims), by = 0.01)
points(x, dchisq(x, df = 1), type = 'l', col = 'red')
It works! The histogram of simulations aligns almost perfectly with a plot of the actual density of a \(\chi^2(1)\) random variable, which we got from dchisq
.
Notice that I had to set probability = TRUE
so that we got a histogram of relative frequencies rather than counts. Otherwise the scale of the histogram relative to the \(\chi^2\) density would have been wrong.
We can try this again for a \(\chi^2(5)\)
sims <- replicate(10000, my.rchisq(5))
hist(sims, probability = TRUE)
x <- seq(from = 0, to = max(sims), by = 0.01)
points(x, dchisq(x, df = 5), type = 'l', col = 'red')
The Student-t distribution is built from the Normal and the \(\chi^2\) so we can reuse my.rchisq
from above:
my.rt <- function(df){
numerator <- rnorm(1)
denominator <- sqrt( my.rchisq(df) / df )
t.sim <- numerator/denominator
return(t.sim)
}
Now we’ll try this out for a Student-t with one degree of freedom:
sims <- replicate(10000, my.rt(1))
hist(sims, probability = TRUE)
x <- seq(from = min(sims), to = max(sims), by = 0.01)
points(x, dt(x, df = 1), type = 'l', col = 'red')
Wow! That looks weird! I told you in class that the Student-t distribution is really wild. What I meant was that it generates lots of “outliers” which is why the histogram looks so strange. We can “zoom in” on the region near zero by trimming out these outliers:
sims <- subset(sims, -6 <= sims & sims <= 6)
hist(sims, probability = TRUE)
x <- seq(from = min(sims), to = max(sims), by = 0.01)
points(x, dt(x, df = 1), type = 'l', col = 'red')
Again we see that it works well. The Student-t gets much less wild when its degrees of freedom increase, so we don’t need to do any trimming to make the plot readable:
sims <- replicate(10000, my.rt(30))
hist(sims, probability = TRUE)
x <- seq(from = min(sims), to = max(sims), by = 0.01)
points(x, dt(x, df = 30), type = 'l', col = 'red')
Last but not least we have the \(F\) random variable, which is defined as a ratio of independent \(\chi^2\) random variables, each divided by their degrees of freedom. We’ll just use my.rchisq
twice:
my.rf <- function(numerator.df, denominator.df){
numerator <- my.rchisq(numerator.df) / numerator.df
denominator <- my.rchisq(denominator.df) / denominator.df
f.sim <- numerator / denominator
return(f.sim)
}
Finally, we’ll test this out for an \(F(5,40)\) random variable
sims <- replicate(10000, my.rf(5,40))
hist(sims, probability = TRUE)
x <- seq(from = 0, to = max(sims), by = 0.01)
points(x, df(x, df1 = 5, df2 = 40), type = 'l', col = 'red')
I hope this was helpful!