The Multivariate Normal Distribution

Francis J. DiTraglia

University of Oxford

What is this?

If you hear a “prominent” economist using the word “equilibrium,” or “normal distribution,” do not argue with him; just ignore him, or try to put a rat down his shirt.

– Nassim Nicholas Taleb

Today

  1. Learn how to simulate correlated normal draws.
  2. Invent the Cholesky Decomposition yourself.
  3. Build intuition for the MV normal distribution.

Review of Notation for Random Vectors

Mean Vector

The vector that stacks the means of each constituent random variable in the the random vector \(\mathbf{X}\). \[ \mathbb{E}(\boldsymbol{X}) = \mathbb{E}\begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_p \end{bmatrix} \equiv \begin{bmatrix} \mathbb{E}(X_1) \\ \mathbb{E}(X_2) \\ \vdots \\ \mathbb{E}(X_p) \end{bmatrix} \]

Variance-Covariance Matrix

  • The matrix whose \((i,j)\) element equals \(\text{Cov}(X_i, X_j)\)
  • Symmetric since \(\text{Cov}(X, Y) = \text{Cov}(Y, X)\)
  • \(i\)th diagonal element is \(\text{Var}(X_i)\) since \(\text{Cov}(X, X) = \text{Var}(X)\).

\[ \text{Var}(\mathbf{X}) = \text{Var}\begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_p \end{bmatrix} = \begin{bmatrix} \text{Var}(X_1) & \text{Cov}(X_1, X_2) & \cdots & \text{Cov}(X_1, X_p) \\ \text{Cov}(X_2, X_1) & \text{Var}(X_2) & \cdots & \text{Cov}(X_2, X_p) \\ \vdots & \vdots & \ddots & \vdots\\ \text{Cov}(X_p, X_1) & \text{Cov}(X_p, X_2) & \cdots& \text{Var}(X_p) \end{bmatrix} \]

Correlation Matrix

  • The matrix whose \((i,j)\) element equals \(\text{Cor}(X_i, X_j)\)
  • Symmetric since \(\text{Cor}(X, Y) = \text{Cor}(Y, X)\)
  • \(i\)th diagonal element is \(1\) since \(\text{Cor}(X, X) = 1\).

\[ \text{Cor}(\mathbf{X}) = \text{Cor}\begin{bmatrix} X_1 \\ X_2 \\ \vdots \\ X_p \end{bmatrix} = \begin{bmatrix} 1 & \text{Cor}(X_1, X_2) & \cdots & \text{Cor}(X_1, X_p) \\ \text{Cor}(X_2, X_1) & 1 & \cdots & \text{Cor}(X_2, X_p) \\ \vdots & \vdots & \ddots & \vdots\\ \text{Cor}(X_p, X_1) & \text{Cor}(X_p, X_2) & \cdots& 1 \end{bmatrix} \]

Generating Correlated Normal Draws

Start with Indep. Standard Normals

Start with independent standard normal RVs: \(Z_1\) and \(Z_2\). \[ \begin{align*} \mathbb{E}\begin{bmatrix} Z_1 \\ Z_2 \end{bmatrix} &\equiv \begin{bmatrix}\mathbb{E}(Z_1) \\ \mathbb{E}(Z_2) \end{bmatrix} = \begin{bmatrix} 0 \\ 0\end{bmatrix}\\ \\ \text{Var} \begin{bmatrix} Z_1 \\ Z_2\end{bmatrix} &\equiv \begin{bmatrix} \text{Var}(Z_1) & \text{Cov}(Z_1, Z_2) \\ \text{Cov}(Z_2, Z_1) & \text{Var}(Z_2) \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\\ \\ \text{Cor} \begin{bmatrix} Z_1 \\ Z_2\end{bmatrix} &\equiv \begin{bmatrix} 1 & \text{Cor}(Z_1, Z_2) \\ \text{Cor}(Z_2, Z_1) & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \end{align*} \]

Simulating in R

  • Simulate a large number of standard normal draws.
  • Pack them into a matrix with columns z1 and z2.
  • Each row represents a draw of the random vector \((Z_1, Z_2)\)
set.seed(99999)
n <- 1e5
z1 <- rnorm(n)
z2 <- rnorm(n)
z <- cbind(z1, z2)
rm(z1, z2)
head(z)
             z1          z2
[1,] -0.4255127 -1.91351696
[2,] -0.2829203  0.05775194
[3,] -0.8986773 -0.48302150
[4,]  0.7065184 -2.37167063
[5,]  2.0916699  0.35882645
[6,]  1.6356643 -0.54297591

Sample Mean Vector; Var/Cor Matrices

colMeans(z)
          z1           z2 
-0.001512035  0.002202837 
var(z) 
            z1          z2
z1 1.006378316 0.000473838
z2 0.000473838 1.003673852
cov(z)
            z1          z2
z1 1.006378316 0.000473838
z2 0.000473838 1.003673852
cor(z)
             z1           z2
z1 1.0000000000 0.0004714688
z2 0.0004714688 1.0000000000

Visualizing Marginal Distributions

These are the distributions of \(Z_1\) and \(Z_2\) separately

library(tidyverse)
library(patchwork) # for placing plots side-by-side

z1_hist <- as_tibble(z) |> 
  ggplot(aes(x = z1))  + 
  geom_histogram(fill = 'black', alpha = 0.5)

z2_hist <- as_tibble(z) |> 
  ggplot(aes(x = z2)) +
  geom_histogram(fill = 'orange', alpha = 0.5)

z1_hist + z2_hist

Visualizing Marginal Distributions

Visualizing Marginal Distributions

Alternatively, use kernel density estimation:

library(tidyverse)
library(patchwork) # for placing plots side-by-side

z1_dens <- as_tibble(z) |> 
  ggplot(aes(x = z1))  + 
  geom_density(fill = 'black', alpha = 0.5)

z2_dens <- as_tibble(z) |> 
  ggplot(aes(x = z2)) +
  geom_density(fill = 'orange', alpha = 0.5)

z1_dens + z2_dens

Visualizing Marginal Distributions

Visualizing the Joint Distribution

Two-dimensional kernel density estimator viewed from above:

as_tibble(z) |> 
  ggplot(aes(x = z1, y = z2)) +
  geom_density2d_filled() +
  coord_fixed() # forces aspect ratio to 1:1 so we see the circles!

Colors are brighter in regions of higher density.

Visualizing the Joint Distribution

Add Constants to Shift the Means

# Shift means from (0, 0) to (1, -1)
x <- cbind(x1 = z[, 1] + 1,  x2 = z[, 2] - 1)

x_marginals <- ggplot(as_tibble(x)) +
  geom_density(aes(x = x1), fill = 'black', alpha = 0.5) +
  geom_density(aes(x = x2), fill = 'orange', alpha = 0.5) +
  xlab('')

x_joint <- ggplot(as_tibble(x)) +
  geom_density2d_filled(aes(x = x1, y = x2)) +
  coord_fixed()

x_marginals + x_joint

Add Constants to Shift the Means

💪 Exercise A - (5 min)

  1. Load the child test score dataset from our earlier lecture and use it to calculate the sample means, variance matrix, and correlation matrix of mom.iq and kid.score.
  2. Continuing from part 2, create marginal and joint kernel density plots of mom.iq and kid.score. Do they appear to be normally distributed?
  3. There’s an R function called cov2cor() but there isn’t one called cor2cov(). Why?
  4. Reading from the color scale, the height of the “peak” in my two-dimensional kernel density plots from above was around 0.16. Why?
  5. The contours of equal density for a pair of uncorrelated standard normal variables are circles. Why?

Scale to Change the Variances

  • Adding constants leaves variances / covariances unchanged.
  • Work with zero mean normals until the last minute; then shift to obtain desired means.
  • To change the variances of \(Z_1\) and \(Z_2\) without creating any covariance between them, multiply each by a constant:
# Change variances from (1, 1) to (4, 25)
x <- cbind(x1 = 2 * z[, 1], x2 = 5 * z[, 2])
cov(x)
           x1          x2
x1 4.02551326  0.00473838
x2 0.00473838 25.09184630

Scale to Change the Variances

Circular contours become elliptical contours:

x_marginals <- ggplot(as_tibble(x)) +
  geom_density(aes(x = x1), fill = 'black', alpha = 0.5) +
  geom_density(aes(x = x2), fill = 'orange', alpha = 0.5) + 
  xlab('')

x_joint <- ggplot(as_tibble(x)) +
  geom_density2d_filled(aes(x = x1, y = x2)) +
  coord_fixed()  

x_marginals + x_joint

Scale to Change the Variances

Combine to Create Correlation

Construct \(X_1\) and \(X_2\) as linear combinations of \((Z_1, Z_2)\)

x <- cbind(x1 = 2 * z[, 1] +     z[, 2],
           x2 =     z[, 1] + 4 * z[, 2])

cov(x)
         x1        x2
x1 5.031082  6.031717
x2 6.031717 17.068951
cor(x)
          x1        x2
x1 1.0000000 0.6508888
x2 0.6508888 1.0000000

Combine to Create Correlation

Now the ellipses are tilted rather than axis-aligned

x_marginals <- ggplot(as_tibble(x)) +
  geom_density(aes(x = x1), fill = 'black', alpha = 0.5) +
  geom_density(aes(x = x2), fill = 'orange', alpha = 0.5) + 
  xlab('')

x_joint <- ggplot(as_tibble(x)) +
  geom_density2d_filled(aes(x = x1, y = x2)) +
  coord_fixed()  

x_marginals + x_joint

Combine to Create Correlation

This is simply matrix multiplication…

x <- cbind(x1 = 2 * z[, 1] +     z[, 2],
           x2 =     z[, 1] + 4 * z[, 2])

A <- matrix(c(2, 1,
              1, 4), 2, 2, byrow = TRUE)

# t(x) is transpose: rows to cols & vice-versa
x_alt <- t(A %*% t(z)) 

colnames(x_alt) <- c('x1', 'x2')
identical(x, x_alt)
[1] TRUE

Our Definition

  • Let \(\boldsymbol{Z}\) be a vector of \(p\) iid standard normal RVs
  • Let \(\mathbf{A}\) be a \((p\times p)\) matrix of constants
  • Let \(\mathbf{c}\) a \(p\)-vector of constants.
  • Then \(\boldsymbol{X} = (\mathbf{c} + \mathbf{A} \boldsymbol{Z})\) is a multivariate normal RV.

Note

You will deduce the properties of \(\boldsymbol{X}\) from first principles in the \(p = 2\) case!

💪 Exercise B - (5 min)

Suppose that \(Z_1, Z_2 \sim \text{ iid N}(0,1)\) and \[ \boldsymbol{X} = \mathbf{A}\boldsymbol{Z}, \quad \boldsymbol{X}= \begin{bmatrix} X_1 \\ X_2 \end{bmatrix}, \quad \mathbf{A} = \begin{bmatrix} a & b \\ c & d\end{bmatrix}, \quad \begin{bmatrix} Z_1 \\ Z_2 \end{bmatrix}. \] Calculate \(\text{Var}(X_1)\), \(\text{Var}(X_2)\), and \(\text{Cov}(X_1, X_2)\) in terms of the constants \(a, b, c, d\). Using these calculations, show that the variance-covariance matrix of \(\boldsymbol{X}\) equals \(\mathbf{A} \mathbf{A}'\). Use this result to work out the variance-covariance matrix of my example from above with \(a = 2, b = 1, c = 1, d = 4\) and check that it agrees with the simulations.

The Cholesky Decomposition

Going Backwards

Find \((a, b, c, d)\) so \(\text{Var}(X_1) =\text{Var}(X_2) = 1\) and \(\text{Cor}(X_1, X_2) = 0.5\) where \[ \begin{align*} X_1 &= a Z_1 + b Z_2 \\ X_2 &= c Z_1 + d Z_2. \end{align*} \]

  • Setting \(a = 1\) and \(b = 0\) gives \(X_1 = Z_1\) and \(\text{Var}(X_1) = 1\).
  • It follows that \(\text{Cov}(X_1, X_2) = \text{Cov}(Z_1, cZ_1 + dZ_2) = c\).
  • Since \(\text{Var}(X_1) = \text{Var}(X_2) = 1\), we have \(c = \text{Cor}(X_1, X_2) = 0.5\).
  • All that remains is to find \(d\). Since \(\text{Var}(X_2) = c^2 + d^2\), set \(d = \sqrt{1 - 0.5^2}\).
  • Expressed in matrix form, we have shown that: \[ \begin{bmatrix} 1 & 0 \\ 0.5 & \sqrt{1 - 0.5^2}\end{bmatrix} \begin{bmatrix} Z_1 \\ Z_1 \end{bmatrix} \sim \text{N}\left(\begin{bmatrix}0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 & 0.5 \\ 0.5 & 1\end{bmatrix} \right). \]

Check that this works as expected…

A <- matrix(c(1, 0,
              0.5, sqrt(1 - 0.5^2)), byrow = TRUE, nrow = 2, ncol = 2)
A 
     [,1]      [,2]
[1,]  1.0 0.0000000
[2,]  0.5 0.8660254
x <- t(A %*% t(z)) 
colnames(x) <- c('x1', 'x2')

cov(x)
          x1        x2
x1 1.0063783 0.5035995
x2 0.5035995 1.0047603
ggplot(as_tibble(x)) +
  geom_density2d_filled(aes(x1, x2)) +
  coord_fixed()  

Check that this works as expected…

💪 Exercise C - (15 min)

  1. Suppose we wanted the correlation between \(X_1\) and \(X_2\) to be \(\rho\), a value that might not equal 0.5. Modify the argument from above accordingly.
  2. In our discussion above, \(X_1\) and \(X_2\) both had variance equal to one, so their correlation equaled their covariance. More generally, we may want to generate \(X_1\) with variance \(\sigma_1^2\) and \(X_2\) with variance \(\sigma_2^2\), where the covariance between them equals \(\sigma_{12}\). Extend your reasoning from the preceding exercise to find an appropriate matrix \(\mathbf{A}\) such that \(\mathbf{A}\mathbf{Z}\) has the desired variance-covariance matrix.
  3. Check your calculations in the preceding part by transforming the simulations z from above into x such that x1 has variance one, x2 has variance four, and the correlation between them equals 0.4. Make a density plot of your result.

What’s the square root of a matrix?

  • You just decomposed \((2\times 2)\) variance matrix \(\boldsymbol{\Sigma} = \mathbf{A} \mathbf{A}'\).
  • Computing \(\mathbf{A}\) is analogous to taking the square root of \(\sigma^2\).
  • Real-valued square roots don’t always exist: e.g. \(\sqrt{-4}\)
  • Square roots aren’t always unique: e.g. \((-2)^2 = 2^2 =4\).
  • Same issues arise for matrices in general, but it’s always possible to decompose a variance matrix matrix into \(\mathbf{A} \mathbf{A}'\).

#NotAllSymmetricMatrices

Every variance-covariance matrix is symmetric, but not every symmetric matrix is a variance-covariance matrix:

M <- matrix(c(4, 16, 
              16, 9), byrow = TRUE, nrow = 2, ncol = 2)
M
     [,1] [,2]
[1,]    4   16
[2,]   16    9
cov2cor(M)
         [,1]     [,2]
[1,] 1.000000 2.666667
[2,] 2.666667 1.000000

Correlations shouldn’t come out to be larger than one!

Positive Definite Matrices

  • In the same way that variances must be positive, variance matrices must be positive definite
  • A symmetric matrix \(\mathbf{M}\) is called positive definite (p.d.) if \(\mathbf{v}'\mathbf{M}\mathbf{v} > 0\) for any vector \(\mathbf{v} \neq 0\).
  • This is effectively the matrix equivalent of “positive number”
  • Requiring \(\boldsymbol{\Sigma}\) p.d. ensures \(\text{Var}(\mathbf{v}' \boldsymbol{X}) > 0\) for any \(\mathbf{v} \neq 0\).
  • I.e. linear combinations of \(\boldsymbol{X}\) have positive variance.

Making it Unique

  • Can always decompose p.d. \(\boldsymbol{\Sigma}\) as \(\mathbf{A}\mathbf{A}'\), but not unique: \[ \begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix} = \mathbf{A}\mathbf{A}'= \mathbf{B}\mathbf{B}' \] \[ \mathbf{A} = \begin{bmatrix} 1 & 0 \\ 1/2 & \sqrt{3}/2 \end{bmatrix}, \quad \mathbf{B} = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{\sqrt{2} - \sqrt{6}}{4} & \frac{\sqrt{2} + \sqrt{6}}{4} \end{bmatrix} \]
  • But doesn’t \(\mathbf{A}\) seem simpler and cleaner?
  • Lower Triangular with positive elements on the diagonal.

The Cholesky Decomposition

  • Any positive definite matrix \(\boldsymbol{\Sigma}\) can be expressed as \(\boldsymbol{\Sigma} = \mathbf{L} \mathbf{L}'\) where \(\mathbf{L}\) is lower triangular with positive diagonal.
  • This is the Cholesky Decomposition
  • You just re-discovered it yourself!
  • chol() returns \(\mathbf{R} = \mathbf{L}'\), i.e. \(\boldsymbol{\Sigma} = \mathbf{R}'\mathbf{R}\).

André-Louis Cholesky (1875-1918)

Example - chol()

Sigma <- matrix(c(1, 0.5,
                  0.5, 1), byrow = TRUE, nrow = 2, ncol = 2)

A <- matrix(c(1, 0,
              1 / 2, sqrt(3) / 2), byrow = TRUE, nrow = 2, ncol = 2)

identical(t(A), chol(Sigma))
[1] TRUE

💪 Exercise D - (\(\infty\) min)

  1. To check if a \((3 \times 3)\) matrix M is p.d., proceed as follows. First check that M[1,1] is positive. Next use det() to check that the determinant of M[1:2,1:2] is positive. Finally check that det(M) is positive. If M passes all the tests, it’s p.d. The same procedure works for any matrix: check that the determinant of each leading principal minor is positive. Only one of these matrices is p.d. Which one? \[ \mathbf{A} =\begin{bmatrix} 1 & 2 & 3\\ 2 & 2 & 1\\ 3 & 1 & 3 \end{bmatrix}, \quad \mathbf{B} = \begin{bmatrix} 3 & 2 & 1 \\ 2 & 3 & 1 \\ 1 & 1 & 3 \end{bmatrix} \]
  2. Let \(\boldsymbol{\Sigma}\) be the p.d. matrix from part 1. Use chol() to make 100,000 draws from a MV normal distribution with this variance matrix. Check your work with var().
  3. Install the package mvtnorm() and consult ?rmvnorm(). Then repeat the preceding exercise “the easy way,” without using chol(). Check your work.
  4. Let \(Y = \alpha X_1 + \beta X_2\), \(\mathbf{v} = (\alpha, \beta)'\), and \(\boldsymbol{\Sigma} = \text{Var}(X_1, X_2)\). Show that \(\text{Var}(Y) = \mathbf{v}' \boldsymbol{\Sigma} \mathbf{v}\).