In this problem you will simulate from the following simple instrumental variables model \[
\begin{aligned}
Y_i &= \beta X_i + U_i \\
X_i &= \pi Z_i + V_i
\end{aligned},
\quad
\begin{bmatrix}
U_i \\ V_i
\end{bmatrix} \sim \text{iid Normal}\left(
\begin{bmatrix}
0 \\ 0
\end{bmatrix},
\begin{bmatrix}
1 & \rho \\
\rho & 1
\end{bmatrix}
\right)
\] for \(i = 1, 2, ..., n\) where \((U_i, V_i)\) are independent of \(Z_i\). The purpose of this simulation is to illustrate what can go wrong with the IV estimator of \(\beta\) and its associated standard error when \(Z_i\) is a weak instrument, i.e. \(\pi\) is small in absolute value. In the simulations below you will fix the true value of \(\beta\) equal to zero. (Any value would give the same qualitative results, and this is the simplest possibility.) You will also hold the values of the instrument \(Z_1, Z_2, ..., Z_n\)fixed across simulation replications. Note that, since the errors in this model are independent of \(Z_i\), there is no need for heteroskedasticity-robust standard errors: use the plain-vanilla ones instead. Use \(n = 100\) throughout.
Exercises
Generate \(n = 100\) standard normal draws z_fixed. Center the result so that mean(z_fixed) is zero to the numerical precision of your machine, and scale them so that sum(z_fixed^2) equals 100. You will use these draws as the instrumental variable in all simulation replications below. Do not re-draw \(Z_i\) at any point below!1Hint: What do you need to divide \(z_i\) by so that \(\sum_i z_i^2 = 100\)?
Write a function called draw_sim_data() to simulate from the above model with \(n = 100\), \(\beta=0\) and instruments z_fixed. Your function should take two arguments: pi_param is the first-stage coefficient, and rho is the correlation between \(U_i\) and \(V_i\). It should return a data frame (or tibble) with three named columns: x, y, and z, where x is the endogenous regressor, y is the outcome, and z is z_fixed. Hint: Use rmvnorm() from the mvtnorm package to draw your correlated errors \(U\) and \(V\). There is no need to do a Choleski decomposition here.
Write a function called get_iv_stats() that takes a single argument dat, a data frame (or tibble) matching the output of draw_sim_data() from above. Your function should return a vector with named elements est and se: the instrumental variables estimator for \(\beta\) and corresponding standard error computed from dat$y, dat$x, and dat$z. Compute these by hand, i.e. without using ivreg. Hint: You will need to compute an estimate for the IV intercept here, even though you don’t report it. All variables are centered around zero. However due to natural sampling variation in your \(n=100\) draws the intercept is important for model fit. Your results won’t match ivreg without.
Using the functions you wrote in the two preceding parts, simulate 100 observations from the model described above with \(\pi = 1\) and \(\rho = 0.5\) and calculate the instrumental variables estimator and associated standard error for \(\beta\). Use ivreg to check your results.
The concentration parameter\(\mu^2\) quantifies the strength of an instrumental variable. Larger values indicate a stronger instrument. In the model described above \(\mu^2 = \pi^2 \mathbf{Z}'\mathbf{Z}\). Given the way we have scaled z_fixed, this simplifies to \(\mu^2 = 100 \pi^2\). Write a function called replicate_iv_sim() that replicates the IV simulation experiment described above at fixed parameter values. It should take three arguments: n_reps is the number of simulation replications, conc is the concentration parameter \(\mu^2\) and rho is the correlation between \(U_i\) and \(V_i\) described above. It should return a data frame (or tibble) with columns est and se corresponding to the output of get_iv_stats(), and columns conc and rho that list the parameter values used. (The latter two columns will be helpful in the next part for use with ggplot2.)
Use pmap() from purrr to call replicate_iv_sim() over a grid of parameter values: \(\rho \in\{ 0.5, 0.9, 0.99\}\) and \(\mu^2 \in \{0, 0.25, 10, 100\}\) with n_reps = 10000. For faster computation, you may optionally use future_pmap() from the furrr package for parallel processing. Organize your results in a tibble called simulations. This will be quite a large tibble indeed: 120,000 rows! Use simulations to make a table showing how the median bias of the IV estimator changes with the parameter values. In other words: for each combination of values \(\rho\) and \(\mu^2\), calculate the difference between the median of the sampling distribution of the IV estimator and the true value of the parameter \(\beta\). Since the true parameter value is zero, the median of the sampling distribution equals the median bias. Comment on your results.
Using the tibble simulations from the preceding part, make kernel density plots similar to those in Figure 1 (a) and (b) of Stock, Wright, & Yogo (2002; JBES) “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments.” I suggest using ggplot2 and faceting by rho so that the different values of the concentration parameter appear on the same plot. I also strongly suggest filtering to exclude extremely large or small values before making your kernel density plots. Otherwise you won’t be able to see anything! (You might start with the range \([-5,5]\) and tweak from there.) Create similar plots for the t-statistics (est/se) to see how the sampling distribution of the test statistic changes with instrument strength. Comment on your findings.
Solutions
Exercise 1 - Solution
library(tidyverse)library(ivreg)library(broom)library(mvtnorm)n <-100set.seed(110210)z_fixed <-rnorm(n)z_fixed <- z_fixed -mean(z_fixed) # exact mean zeroz_fixed <- z_fixed /sqrt(sum(z_fixed^2) /100) # scale so sum of squares = n# Testtibble("mean"=mean(z_fixed), "sum of squares"=sum(z_fixed^2))
# A tibble: 1 × 2
mean `sum of squares`
<dbl> <dbl>
1 -1.02e-17 100
Demean: subtract the sample mean so \(\sum z_i = 0\) exactly
Scale: dividing by \(\sqrt{\frac{1}{n}\sum z_i^2}\) scales so \(\sum z_i^2 = 100\), since:
draw_sim_data <-function(pi_param, rho) { z <- z_fixed n <-100 beta <-0# Correlation matrix for (U_i, V_i) cor_mat <-matrix(c(1, rho, rho, 1 ), 2, 2, byrow =TRUE)# Generate correlated errors errors <-rmvnorm(n, sigma = cor_mat) u <- errors[, 1] v <- errors[, 2]# Generate X and Y from the structural model x <- pi_param * z + v y <- beta * x + utibble(x = x, y = y, z = z)}
Note that \(\pi\) controls the strength of instrument \(Z\). \(\rho\) controls the severity of the endogeneity issue:
get_iv_stats <-function(dat) {# Extract variables from tibble x <- dat |>pull(x) y <- dat |>pull(y) z <- dat |>pull(z)# Calculate IV estimator components s_zx <-cov(z, x) beta_iv <-cov(z, y) / s_zx alpha_iv <-mean(y) - beta_iv *mean(x)# Calculate residuals for standard error u_hat <- y - alpha_iv - beta_iv * x# Standard error calculation# Finite-sample correction so SE agrees with ivreg n <-length(y) se <-sd(z) *sd(u_hat) / (abs(s_zx) *sqrt(n -2))c("est"= beta_iv, "se"= se)}
To match the results of ivreg(), it is necessary to add a finite-sample correction to the standard error. The variance of \(\hat\beta_{IV}\) (treating \(\mathbf{Z}\) as fixed) is: \[
\text{Var}(\hat\beta_{IV}) = \frac{\sigma_u^2 \cdot \mathbf{Z}'\mathbf{Z}}{(\mathbf{Z}'\mathbf{X})^2}
\] We estimate \(\sigma_u^2\) from the IV residuals, using \(n - 2\) degrees of freedom because two parameters (\(\alpha_{IV}\) and \(\beta_{IV}\)) were estimated. This is the same \(n - k\) correction as in OLS:
\[
\hat\sigma_u^2 = \frac{\sum_i \hat{u}_i^2}{n-2} = \frac{(n-1)s^2_{\hat u}}{n-2}
\] Since \(\bar{z} = 0\) we have \(\mathbf{Z}'\mathbf{Z} = (n-1)s_z^2\) and \(\mathbf{Z}'\mathbf{X} = (n-1)s_{zx}\), so:
\[
\widehat{SE}(\hat\beta_{IV}) = \sqrt{\frac{\hat\sigma_u^2 \cdot \mathbf{Z}'\mathbf{Z}}{(\mathbf{Z}'\mathbf{X})^2}} = \frac{s_z \cdot s_{\hat u}}{|s_{zx}|\sqrt{n-2}}
\] which is exactly what the code computes. Without this \(n-2\) degrees-of-freedom correction (i.e., using \(n-1\) instead), the SE would be slightly underestimated in small samples.
Exercise 4 - Solution
# Test the functionstest_data <-draw_sim_data(pi_param =1, rho =0.5)# Compare hand-calculated results with ivregcat("Results from ivreg():\n")
Results from ivreg():
tidy(ivreg(y ~ x | z, data = test_data))
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -0.195 0.106 -1.84 0.0693
2 x -0.0207 0.0992 -0.209 0.835
What’s the problem with weak instruments? Even though instruments are valid, they don’t fully eliminate omitted variable bias (OVB) in finite samples. For intuition, consider an instrument \(Z\) uncorrelated with \(X\). Variance decomposition in the first stage is arbitrary. Regression on the instrument fails to eliminate variance in \(X\) that is confounded by \(U\). This propagates to the 2SLS estimator, so that it fully inherits the OLS estimator’s bias.
What about invalid instruments?\(Z\) is irrelevant if \(\pi = 0\); it is invalid (exclusion restriction violation) if \(\mathbb{E}[Z_i U_i] \neq 0\). For invalid instruments \(\hat{\beta}_{2SLS}\) is inconsistent (bias does not disappear in large samples), and this inconsistency is magnified when the instrument is also weak. Note that \(\mathbb{E}[Z_i U_i] = 0\) by construction in our setup, so \(Z\) is always valid in or example.
Asymptotics for \(\hat \beta_{OLS}\) and \(\hat \beta_{2SLS}\)
\[
\underset{n \to \infty}{\text{plim}}\ \hat{\beta}_{2SLS} = \beta + \frac{\underset{n \to \infty}{\text{plim}}\left(\frac{1}{n}\sum \hat{x}_i u_i\right)}{\underset{n \to \infty}{\text{plim}}\left(\frac{1}{n}\sum \hat{x}_i x_i\right)} = \beta + \frac{\mathbb{E}[z_i u_i]}{\mathbb{E}[z_i x_i]}
\] For the simplification, note that \(\hat{x}_i = z_i \hat{\pi}\), and that \(\hat{\pi}\) is a scalar that cancels.
\(\mathbb{E}[z_i u_i] \neq 0\) for invalid instruments, which contributes to the inconsistency. \(\mathbb{E}[z_i x_i] \approx 0\) for weak instruments, which blows up the bias term.
library(furrr)# Set up parameter grid using expand_grid (tidyr)pars <-expand_grid(rho =c(0.5, 0.9, 0.99), conc =c(0, 0.25, 10, 100))# Option 1: Serial version using pmap# simulations_serial <- pmap(pars, replicate_iv_sim, n_reps = 1e4) |> list_rbind()# Option 2: Parallel version using future_pmapplan(multisession, workers =8)my_options <-furrr_options(seed =TRUE)simulations <-future_pmap(pars, replicate_iv_sim,n_reps =1e4,.options = my_options) |>list_rbind()# Create bias tablesimulations |>group_by(conc, rho) |>summarize(bias =median(est), .groups ="drop") |>arrange(desc(rho)) |> knitr::kable(digits =c(0, 2, 3))
conc
rho
bias
0
0.99
0.994
0
0.99
0.652
10
0.99
0.003
100
0.99
0.000
0
0.90
0.904
0
0.90
0.689
10
0.90
0.006
100
0.90
-0.001
0
0.50
0.507
0
0.50
0.377
10
0.50
0.007
100
0.50
0.001
The table reveals two clear patterns:
Median bias is close to zero for the strongest instrument (\(\mu^2 = 100\)) regardless of \(\rho\): when \(\pi = 1\) the denominator \(\mathbb{E}[z_i x_i]\) is large, so the bias term in \[\underset{n \to \infty}{\text{plim}}\ \hat{\beta}_{2SLS} = \beta + \frac{\mathbb{E}[z_i u_i]}{\mathbb{E}[z_i x_i]}\] is negligible even in finite samples.
As \(\mu^2\) falls toward zero, median bias rises toward \(\rho\), and at \(\mu^2 = 0\) it is essentially equal to \(\rho\): with an irrelevant instrument (\(\pi = 0\)) the IV estimator inherits the full OLS endogeneity bias.
Intuitively, for irrelevant instruments with \(\pi = 0\) the denominator \(\mathbb{E}[z_i x_i] = 0\), so any finite-sample noise in \(\frac{1}{n}\sum z_i u_i\) inflates the ratio without bound, pushing the estimator toward the OLS limit. Within each level of \(\mu^2\), bias is also increasing in \(\rho\): a more severe endogeneity problem is harder to correct with a weak instrument.
Exercise 7 - Solution
plot_iv_estimator <- simulations |>mutate(rho_label =paste0("rho==", rho)) |># labeller for faceted plotfilter(abs(est) <3) |>ggplot(aes(x = est, color =factor(conc))) +geom_density() +geom_vline(xintercept =0, linetype =2) +scale_color_viridis_d(direction =-1) +facet_wrap(~rho_label, labeller = label_parsed) +labs(color =expression(paste(mu^2, " (strength of instrument)")),title ="IV estimator",subtitle =expression(paste("Simulation results faceted by ", rho," (strength of endogeneity)" )),caption ="Source: 120,000 Monte Carlo Simulations" ) +xlab("IV estimator")plot_iv_test_statistic <- simulations |>mutate(tstat = est / se,rho_label =paste0("rho==", rho) ) |># labeller for faceted plotfilter(abs(tstat) <5) |>ggplot(aes(x = tstat, color =factor(conc))) +geom_density() +geom_vline(xintercept =0, linetype =2) +scale_color_viridis_d(direction =-1) +facet_wrap(~rho_label, labeller = label_parsed) +labs(color =expression(paste(mu^2, " (strength of instrument)")),title ="IV t-statistic",subtitle =expression(paste("Simulation results faceted by ", rho," (strength of endogeneity)" )),caption ="Source: 120,000 Monte Carlo Simulations" ) +xlab("IV t-statistic")# Display plotsplot_iv_estimator
plot_iv_test_statistic
IV estimator: The density plots show the distribution of the IV estimator over 120,000 simulation runs. With a strong instrument (\(\mu^2 = 100\)), the sampling distribution of \(\hat\beta_{IV}\) is tightly concentrated around the true value \(\beta = 0\) and approximately normal: the large-sample theory is working as intended. As \(\mu^2\) decreases, the density spreads dramatically and develops heavy tails. At \(\mu^2 \in \{0, 0.25\}\) the distribution is essentially Cauchy (a ratio of two near-zero random variables) with no finite variance, which is why extreme values had to be filtered before plotting. The three facets show that higher \(\rho\) worsens the spread: greater endogeneity means the IV estimator is more easily contaminated when the denominator \(\mathbb{E}[z_i x_i] = \pi\mathbb{E}[z_i^2]\) is small.
Both the bias visible in the table and the spread visible in the density plots trace back to the same source: the denominator \(\mathbb{E}[z_i x_i] = \pi\mathbb{E}[z_i^2]\) shrinks as \(\pi \to 0\), amplifying any finite-sample noise in \(\frac{1}{n}\sum z_i u_i\) into large deviations from the truth.
t-statistic. With a strong instrument, the t-statistic is well approximated by \(N(0,1)\) under the null \(\beta = 0\), and the density is centred on zero. With a weak instrument, the distribution shifts away from zero in the direction of the bias and the tails fatten substantially: the null \(\beta = 0\) would be rejected far more often than the nominal test size.
This is size distortion, i.e. the distribution of the test statistic becomes skewed in the direction of bias. This means that we reject the chosen null more often than we should. Since the IV estimator is itself biased away from zero, the ratio \(\hat\beta_{IV} / \widehat{SE}\) is systematically shifted, and the critical values from the standard normal are no longer valid. Standard confidence intervals and hypothesis tests from a weak-IV regression are unreliable even when the instrument is technically valid.
Expected output
Your final plots should look something like this:
Footnotes
We are using the same z_fixed and normalising them to control instrument strength.↩︎