Heteroskedasticity and Clustering
University of Oxford
You can always do a linear projection. – Jim Hamilton
If \(Y_i = \alpha + \beta X_i + U_i\), then
\[ \begin{align} \sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y}) &= \sum_{i=1}^n (X_i - \bar{X})Y_i \\ &= \sum_{i=1}^n (X_i - \bar{X}) (\alpha + \beta X_i + U_i) \\ &= \alpha \sum_{i=1}^n (X_i - \bar{X}) + \beta \sum_{i=1}^n (X_i - \bar{X})X_i + \sum_{i=1}^n (X_i - \bar{X}) U_i\\ &= \beta \sum_{i=1}^n (X_i - \bar{X})^2 + \sum_{i=1}^n (X_i - \bar{X}) U_i \end{align} \]
\[ \begin{align*} \widehat{\beta} &= \frac{\sum_{i=1}^n(X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2}\\ \\ &= \frac{\beta \sum_{i=1}^n (X_i - \bar{X})^2 + \sum_{i=1}^n (X_i - \bar{X}) U_i }{\sum_{i=1}^n (X_i - \bar{X})^2} \\ \\ &= \beta + \frac{\sum_{i=1}^n U_i (X_i - \bar{X})}{\sum_{i=1}^n (X_i - \bar{X})^2} \end{align*} \]
\[ \begin{align} \text{Var}(\widehat{\beta}|X_1, \dots X_n) &= \text{Var}\left[\left.\beta + \frac{\sum_{i=1}^n U_i (X_i - \bar{X})}{\sum_{i=1}^n (X_i - \bar{X})^2}\right| \mathbf{X}\right] \\ \\ &=\frac{\text{Var}\left[\left.\sum_{i=1}^n U_i (X_i - \bar{X})\right| \mathbf{X}\right]}{\left[\sum_{i=1}^n (X_i - \bar{X})^2\right]^2} \end{align} \]
The question is how we simplify the numerator…
Conditional on all the \(X_i\) observations, all of the errors \(U_i\) are uncorrelated and have the same variance: \(\text{Var}[\mathbf{U}|\mathbf{X}] = \sigma^2 \mathbf{I}\)
\[ \begin{align} \text{Var}\left[\left.\sum_{i=1}^n U_i (X_i - \bar{X})\right| \mathbf{X}\right] &= \sum_{i=1}^n \text{Var}\left[\left. U_i (X_i - \bar{X})\right| \mathbf{X}\right] \\ &= \sum_{i=1}^n (X_i - \bar{X})^2 \text{Var}[U_i|\mathbf{X}] \\ &= \sigma^2 \sum_{i=1}^n (X_i - \bar{X})^2 \end{align} \]
\(\text{Var}[\mathbf{U}|\mathbf{X}] = \sigma^2 \mathbf{I}\) implies that
\[ \text{Var}(\widehat{\beta}|\mathbf{X}) = \frac{\sigma^2 \sum_{i=1}^n (X_i - \bar{X})^2}{\left[\sum_{i=1}^n (X_i - \bar{X})^2\right]^2} =\frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2}. \] Then simply estimate \(\sigma^2\) using OLS residuals.
Conditional on all the \(X_i\) observations, all of the errors \(U_i\) are uncorrelated but \(\text{Var}(U_i|\mathbf{X}) = \text{Var}(U_i|X_i) = \sigma^2(X_i)\)
\[ \begin{align} \text{Var}\left[\left.\sum_{i=1}^n U_i (X_i - \bar{X})\right| \mathbf{X}\right] &= \sum_{i=1}^n \text{Var}\left[\left. U_i (X_i - \bar{X})\right| \mathbf{X}\right] \\ &= \sum_{i=1}^n (X_i - \bar{X})^2 \text{Var}[U_i|\mathbf{X}] \\ &= \sum_{i=1}^n \sigma^2(X_i) (X_i - \bar{X})^2 \end{align} \]
\[ \text{Var}(\mathbf{U}|\mathbf{X}) = \text{diag}\{ \sigma^2(X_i)\}, \; \text{Var}(\widehat{\beta}|\mathbf{X}) =\frac{\sum_{i=1}^n \sigma^2(X_i) (X_i - \bar{X})^2}{\left[\sum_{i=1}^n (X_i - \bar{X})^2\right]^2} \]
lm_robust()
lm_robust()
function from estimatr
package provides heterosekdasticity-consistent SEs.lm()
except:
se_type = 'classical'
gives same SEs as lm()
se_type = 'stata'
matches Stata’s “reg, robust”HC0
, HC1
(Stata), HC2
(default), HC3
.broom::tidy()
and modelsummary()
lm_robust()
- Examplelm_robust()
- ExampleClassical | Robust | |
---|---|---|
(Intercept) | 0.34 | 0.34 |
(0.22) | (0.17) | |
x | 0.78 | 0.78 |
(0.38) | (0.40) | |
Num.Obs. | 100 | 100 |
R2 | 0.041 | 0.041 |
RMSE | 1.03 | 1.03 |
lm_robust
lm
-based F-test from linearHypothesis()
lm_robust
with desired SEs instead# A tibble: 1 × 8
term null.value estimate std.error statistic p.value df.residual df
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 x 0 0.777 0.379 4.21 0.0402 98 1
# A tibble: 1 × 8
term null.value estimate std.error statistic p.value df.residual df
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 x 0 0.777 0.403 3.72 0.0538 98 1
4321
and generate n = 100
uniform draws for x
y
equal to 0.2 + 0.9 * x + error
where error
is a vector of independent, mean-zero normal errors with standard deviation sqrt(2 * x)
.x
and y
, replicate my regression and F-test results from above.x
and y
. Check that your results match those of lm_robust()
.Mechanically: \[ \text{Var}\left[\left.\sum_{i=1}^n U_i(X_i - \bar{X})\right|\mathbf{X}\right] \neq \sum_{i=1}^n \text{Var}[U_i|\mathbf{X}](X_i - \bar{X})^2 \] Intuitively:
\[ Y_{ig} = \alpha + \beta X_{ig} + U_{ig} \]
Note
In a (balanced) panel dataset the groups are individuals and \(n_g = T\).
\[ \begin{align} \text{Var}(\widehat{\beta}|\mathbf{X}) &=\frac{\text{Var}\left[\left.\sum_{j=1}^N U_j (X_j - \bar{X})\right| \mathbf{X}\right]}{\left[\sum_{j=1}^N (X_j - \bar{X})^2\right]^2} \\ \\ &=\frac{\text{Var}\left[\left.\sum_{g=1}^G \sum_{i=1}^{n_g} U_{ig} (X_{ig} - \bar{X})\right| \mathbf{X}\right]}{\left[\sum_{g=1}^G \sum_{i=1}^{n_g} (X_{ig} - \bar{X})^2\right]^2} \end{align} \]
As before, the key is the numerator; the denominator is easy.
\[ \begin{align} \text{Var}(\widehat{\beta}|\mathbf{X}) &=\frac{\text{Var}\left[\left.\sum_{g=1}^G \sum_{i=1}^{n_g} U_{ig} \tilde{X}_{ig} \right| \mathbf{X}\right]}{\left[\sum_{g=1}^G \sum_{i=1}^{n_g} \tilde{X}_{ig}^2\right]^2}=\frac{\text{Var}\left[\left.\sum_{g=1}^G \tilde{\mathbf{X}}'_g \mathbf{U}_g \right| \mathbf{X}\right]}{\left[\sum_{g=1}^G \tilde{\mathbf{X}}_g'\tilde{\mathbf{X}}_g\right]^2} \end{align} \]
If clusters are drawn at random from some population, then there is no correlation between them: \[ \begin{align} \text{Var}\left[\left.\sum_{g=1}^G \tilde{\mathbf{X}}'_g \mathbf{U}_g \right| \mathbf{X}\right] &= \sum_{g=1}^G \text{Var}\left[\left. \tilde{\mathbf{X}}'_g \mathbf{U}_g\right|\mathbf{X} \right] \\ &= \sum_{g=1}^G \tilde{\mathbf{X}}_g' \, \mathbb{E}[\mathbf{U}_g \mathbf{U}_g'|\mathbf{X}_g]\, \tilde{\mathbf{X}}_g\\ &= \sum_{g=1}^G \tilde{\mathbf{X}}_g' \Sigma_g \tilde{\mathbf{X}}_g. \end{align} \]
\[ \begin{align} \text{Var}(\widehat{\beta}|\mathbf{X}) = \frac{\sum_{g=1}^G \tilde{\mathbf{X}}'_g \Sigma_g \tilde{\mathbf{X}}_g }{\left[\sum_{g=1}^G \tilde{\mathbf{X}}_g'\tilde{\mathbf{X}}_g\right]^2} \approx \frac{\sum_{g=1}^G \tilde{\mathbf{X}}'_g \hat{\mathbf{U}}_g \hat{\mathbf{U}}_g' \tilde{\mathbf{X}}_g}{\left[\sum_{g=1}^G \tilde{\mathbf{X}}_g'\tilde{\mathbf{X}}_g\right]^2} \end{align} \]
Simplest interesting example:
\[ \begin{align} \Sigma_g &= \mathbb{E}[\mathbf{U}_g \mathbf{U}_g'|\mathbf{X}_g] = \mathbb{E}[\mathbf{U}_g\mathbf{U}_g'|X_g] = \mathbb{E}[\mathbf{U}_g\mathbf{U}_g']\\ &= \sigma^2 \begin{bmatrix} 1 & \rho & \cdots & \rho \\ \rho & 1 & \\ \vdots & & \ddots & \vdots\\ \rho & \rho & \cdots & 1 \end{bmatrix} = \sigma^2[(1 - \rho) \mathbf{I}_n + \rho \,\boldsymbol{\iota}_n\boldsymbol{\iota}_n'] \end{align} \] where \(\rho \equiv \text{Var}(\eta_g) / \sigma^2\) and \(\sigma^2 \equiv \text{Var}(U_{ig})\).
Suppose that:
After some algebra: \[ \text{Var}(\widehat{\beta}|\mathbf{X}) = [1 + (n - 1)\rho]\times \left[ \frac{\sigma^2}{\sum_{j=1}^N (X_j - \bar{X})^2}\right] \] Clustering scales “usual” variance of \(\widehat{\beta}\) by \([1 + (n-1)\rho]\)!
lm_robust()
clusters
to name of grouping variable.se_type
to CR0
, CR2
(default) or stata
lm()
tidy()
, modelsummary()
, linearHypothesis()
library(haven)
ddk2011 <- read_dta('https://ditraglia.com/data/DDK2011.dta')
# convert outcome to z-score
ddk2011 <- ddk2011 |>
mutate(testscore = scale(totalscore))
reg_classical <- lm(testscore ~ tracking, ddk2011)
# Defaults to HC2
reg_heterosked <- lm_robust(testscore ~ tracking, ddk2011)
# Defaults to CR2
reg_cluster <- lm_robust(testscore ~ tracking, ddk2011,
clusters = schoolid)
modelsummary(list(Classical = reg_classical,
Heterosked = reg_heterosked,
Cluster = reg_cluster),
gof_omit = 'R2 Adj,|AIC|BIC|F|Log.Lik.')
Classical | Heterosked | Cluster | |
---|---|---|---|
(Intercept) | −0.071 | −0.071 | −0.071 |
(0.019) | (0.019) | (0.055) | |
tracking | 0.138 | 0.138 | 0.138 |
(0.026) | (0.026) | (0.078) | |
Num.Obs. | 5795 | 5795 | 5795 |
R2 | 0.005 | 0.005 | 0.005 |
R2 Adj. | 0.005 | 0.005 | 0.005 |
RMSE | 1.00 | 1.00 | 1.00 |
Std.Errors | by: schoolid |
iv_robust()
from estimatr
has you covered!ivreg()
from AER
y ~ x1 + x2 | z1 + z2
declares x1
and x2
as endog regressors, z1
and z2
as exog IVs.clusters
and se_type
as in lm_robust()
tidy()
, modelsummary()
etc.