The “Classic” Linear Homogeneous Coefficients Model
University of Oxford
The more traditional way of writing this model: \[ Y = \alpha + \beta X + U, \quad \text{Cov}(Z, X) \neq 0, \quad \text{Cov}(Z,U) = 0 \]
\[ \begin{align} \beta_{OLS} &\equiv \frac{\text{Cov}(X,Y)}{\text{Var}(X)}\\ \\ &= \frac{\text{Cov}(X, \alpha + \beta X + U)}{\text{Var}(X)} \\ \\ &= \beta + \frac{\text{Cov}(X,U)}{\text{Var}(X)}\\ \end{align} \]
\[ \begin{align} \beta_{IV} &\equiv \frac{\text{Cov}(Z,Y)}{\text{Cov}(Z,X)}\\ \\ &= \frac{\text{Cov}(Z, \alpha + \beta X + U)}{\text{Cov}(Z,X)} \\ \\ &= \beta + \frac{\text{Cov}(Z,U)}{\text{Cov}(Z,X)}\\ \end{align} \]
Observe that:
\[ \beta_{IV} \equiv \frac{\text{Cov}(Z,Y)}{\text{Cov}(Z,X)} = \frac{\text{Cov}(Z,Y)/\text{Var}(Z)}{\text{Cov}(Z,X)/\text{Var}(Z)} \equiv \frac{\gamma_1}{\pi_1} = \frac{\text{Reduced Form}}{\text{First Stage}} \] Reduced Form: Regress \(Y\) on \(Z\) \[ \gamma_0 \equiv \mathbb{E}(Y) - \gamma_1 \mathbb{E}(Z); \quad \gamma_1 \equiv \frac{\text{Cov}(Z,Y)}{\text{Var}(Z)}; \quad \epsilon \equiv Y - \gamma_0 - \gamma_1 Z \] \[ \implies Y = \gamma_0 + \gamma_1 Z + \epsilon; \quad \mathbb{E}(\epsilon) = \text{Cov}(Z, \epsilon) = 0 \] First Stage: Regress \(X\) on \(Z\) \[ \pi_0 \equiv \mathbb{E}(X) - \pi_1 \mathbb{E}(Z); \quad \pi_1 \equiv \frac{\text{Cov}(Z,X)}{\text{Var}(Z)}; \quad V \equiv X - \pi_0 - \pi_1 Z \] \[ \implies X = \pi_0 + \pi_1 Z + V; \quad \mathbb{E}(V) = \text{Cov}(Z, V) = 0 \]
1234
and then generate 5000 iid standard normal draws to serve as your instrument z
. Next, use rmvnorm()
to generate 5000 iid draws from a bivariate standard normal distribution with correlation \(\rho = 0.5\), independently of z
.x
and y
according to the IV model with \(\pi_0 = 0.5\), \(\pi_1 = 0.8\), \(\alpha = -0.3\), and \(\beta = 1\).y
on x
. Run this regression to check.y
on z
. Run this regression to check.y
on both x
and z
. What is your estimate of the slope coefficient on x
? How does it compare to the OLS and IV estimates? What gives?\[ \begin{align*} Y &= \beta_0 + \beta_1 X + \beta_2 W + U\\ X &= \pi_0 + \pi_1 Z_1 + \pi_2 Z_2 + \pi_3 W + V\\ \end{align*} \]
\[ \begin{align*} Y &= \beta_0 + \beta_1 X + \beta_2 W + U\\ X &= \pi_0 + \pi_1 Z_1 + \pi_2 Z_2 + \pi_3 W + V\\ \end{align*} \]
AER::ivreg()
ivreg()
function from the AER
package carries out TSLS estimation and calculates the correct standard errors. (Today: assume homoskedasticity.)tidy()
, augment()
, and glance()
from broom
work with ivreg()
. See here.ivreg()
syntax: ivreg([FORMULA_HERE]), data = [DATAFRAME_HERE])
[CAUSAL_MODEL_FORMULA] | [FIRST_STAGE_FORMULA]
~
y ~ x + w | z1 + z2 + w
1234
and then generate 10000 draws of \((Z_1, Z_2, W)\) from a trivariate standard normal distribution in which each pair of RVs has correlation \(0.3\). Then generate \((U, V)\) independently of \((Z_1, Z_2, W)\) as in Exercise A above.x
and y
according to the IV model from above with coefficients \((\pi_0, \pi_1, \pi_2, \pi_3) = (0.5, 0.2, -0.15, 0.25)\) for the first-stage and \((\beta_0, \beta_1, \beta_2) = (-0.3, 1, -0.7)\) for the causal model.lm()
. Compare your estimated coefficients and standard errors to those from AER::ivreg()
.w
from your first-stage regression, including it only in your second-stage regression. What happens? Why?ivreg()
? Explain.w
from both your first-stage and causal model formulas in ivreg()
? Are there any situations in which this would work? Explain.library(tidyverse)
data_url <- 'https://ditraglia.com/data/Ginsburgh-van-Ours-2003.csv'
qe <- read_csv(data_url)
qe
# A tibble: 132 × 13
`birth year` year ranking order age belgium ussr usa female critics
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1928 1952 1 3 24 0 0 1 0 41
2 1923 1952 2 8 29 0 0 0 0 28
3 1931 1952 3 9 21 0 0 0 1 29
4 1929 1952 4 12 23 1 0 0 0 12
5 1929 1952 5 6 23 0 0 0 0 0
6 1926 1952 6 2 26 0 0 1 0 4
7 1926 1952 7 10 26 0 0 1 0 6
8 1923 1952 8 11 29 0 0 0 0 26
9 1928 1952 9 5 24 0 0 0 0 0
10 1934 1952 10 4 18 0 0 0 0 31
# ℹ 122 more rows
# ℹ 3 more variables: bll <dbl>, gccd <dbl>, presen <dbl>
Instruments / Regressors:
birth year
= year of birthyear
= year of competitionranking
= ranking by the jury (1-12)order
= order of appearance (1-12)age
= age at time of performancebelgium
= dummy for Belgian pianistUSSR
= dummy for USSR pianistUSA
= dummy for USA pianistfemale
= dummy for female pianistOutcomes (Success Indicators):
critics
= ratings by criticsBLL
= # records in BLL catalogueGCCD
= # records in GCC/D cataloguespresen
= # of catalogues present (1-3)In this exercise you will need to work with the columns critics
, order
, and ranking
from the qe
tibble.
first
to qe
that takes on the value TRUE
if order
equals one. You will need this variable in the following parts.first
to instrument for ranking. Discuss your findings.