Problem Set: Weak Instrument Simulation
In this problem you will simulate from the following simple instrumental variables model \[ \begin{aligned} Y_i &= \beta X_i + U_i \\ X_i &= \pi Z_i + V_i \end{aligned}, \quad \begin{bmatrix} U_i \\ V_i \end{bmatrix} \sim \text{iid Normal}\left( \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \right) \] for \(i = 1, 2, ..., n\) where \((U_i, V_i)\) are independent of \(Z_i\). The purpose of this simulation is to illustrate what can go wrong with the IV estimator of \(\beta\) and its associated standard error when \(Z_i\) is a weak instrument, i.e. \(\pi\) is small in absolute value. In the simulations below you will fix the true value of \(\beta\) equal to zero. (Any value would give the same qualitative results, and this is the simplest possibility.) You will also hold the values of the instrument \(Z_1, Z_2, ..., Z_n\) fixed across simulation replications. Note that, since the errors in this model are independent of \(Z_i\), there is no need for heteroskedasticity-robust standard errors: use the plain-vanilla ones instead. Use \(n = 100\) throughout.
Generate \(n = 100\) standard normal draws
z_fixed
. Center the result so thatmean(z_fixed)
is zero to the numerical precision of your machine, and scale them so thatsum(z_fixed^2)
equals 100. You will use these draws as the instrumental variable in all simulation replications below. Do not re-draw \(Z_i\) at any point below!Write a function called
draw_sim_data()
to simulate from the above model with \(n = 100\), \(\beta=0\) and instrumentsz_fixed
. Your function should take two arguments:pi
is the first-stage coefficient, andrho
is the correlation between \(U_i\) and \(V_i\). It should return a data frame (or tibble) with three named columns:x
,y
, andz
, wherex
is the endogenous regressor,y
is the outcome, andz
isz_fixed
.Write a function called
get_iv_stats()
that takes a single argumentdat
, a data frame (or tibble) matching the output ofdraw_sim_data()
from above. Your function should return a vector with named elementsest
andse
: the instrumental variables estimator for \(\beta\) and corresponding standard error computed fromdat$y
,dat$x
, anddat$z
. Compute these by hand, i.e. without usingivreg
.Using the functions you wrote in the two preceding parts, simulate 100 observations from the model described above with \(\pi = 1\) and \(\rho = 0.5\) and calculate the instrumental variables estimator and associated standard error for \(\beta\). Use
ivreg
to check your results.The concentration parameter \(\mu^2\) quantifies the strength of an instrumental variable. Larger values indicate a stronger instrument. In the model described above \(\mu^2 = \pi^2 \mathbf{Z}'\mathbf{Z}\). Given the way we have scaled
z_fixed
, this simplifies to \(\mu^2 = 100 \pi^2\). Write a function calledreplicate_iv_sim()
that replicates the IV simulation experiment described above at fixed parameter values. It should take three arguments:n_reps
is the number of simulation replications,conc
is the concentration parameter \(\mu^2\) andrho
is the correlation between \(U_i\) and \(V_i\) described above. It should return a data frame (or tibble) with columnsest
andse
corresponding to the output ofget_iv_stats()
, and columnsconc
andrho
that list the parameter values used. (The latter two columns will be helpful in the next part for use withggplot2
.)Use
pmap()
frompurrr
to callreplicate_iv_sim()
over a grid of parameter values: \(\rho \in\{ 0.5, 0.9, 0.99\}\) and \(\mu^2 \in \{0, 0.25, 10, 100\}\) withnreps = 10000
. For faster computation, you may optionally usefuture_pmap()
from thefurrr
package for parallel processing. Organize your results in a tibble calledsimulations
. This will be quite a large tibble indeed: 120,000 rows! Usesimulations
to make a table showing how the median bias of the IV estimator changes with the parameter values. In other words: for each combination of values \(\rho\) and \(\mu^2\) calculate the difference between the median of the sampling distribution of the IV estimator and the true value of the parameter \(\beta\). Since the true parameter value is zero, the median of the sampling distribution equals the median bias. Comment on your results.Using the tibble
simulations
from the preceding part, make kernel density plots similar to those in Figure 1 (a) and (b) of Stock, Wright, & Yogo (2002; JBES) “A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments.” I suggest usingggplot2
and faceting byrho
so that the different values of the concentration parameter appear on the same plot. I also strongly suggest filtering to exclude extremely large or small values before making your kernel density plots. Otherwise you won’t be able to see anything! (You might start with the range \([-5,5]\) and tweak from there.) You may also want to create similar plots for the t-statistics (est/se) to see how the sampling distribution of the test statistic changes with instrument strength. Comment on your findings.