Panel Data Basics - Solutions

Exercise A - (20 min)

Download fake-panel-data.csv from https://ditraglia.com/data. This dataset was simulated according to the one-way error components model described above. It contains six columns: person is a unique person identifier (name), year is a year index (1-5), x and y are the regressor and outcome variable, and epsilon and eta are the error terms. (In real data you wouldn’t have the errors, but this is a simulation!)

  1. Use lm to regress y on x with “classical” standard errors. Repeat with standard errors clustered by person using lm_robust(). Discuss your results.
  2. Plot y against x along with the regression line from part 1.
  3. Repeat 2, but use a different color for the points that correspond to each person in the dataset and plot a separate regression line for each person.
  4. What does the plot you made in part 3 suggest? Use the columns epsilon and eta to check your conjecture.
  5. Finally, use lm_robust() to regress y on x and a dummy variable for each person, clustering the standard errors by person. Discuss your results.

Solution

Part 1

library(tidyverse)
library(estimatr)
library(modelsummary)
fake_panel <- read_csv('https://ditraglia.com/data/fake-panel-data.csv')

reg_classical <- lm(y~ x, fake_panel)
reg_cluster <- lm_robust(y ~ x, fake_panel, clusters = person)

modelsummary(list(Classical = reg_classical, 
                  Clustered = reg_cluster), 
             gof_omit = 'AIC|BIC|F|RMSE|R2|Log.Lik.')
Classical Clustered
(Intercept) 1.134 1.134
(0.196) (0.398)
x 1.345 1.345
(0.378) (0.595)
Num.Obs. 50 50
Std.Errors by: person

Part 2

fake_panel |>
  ggplot(aes(x, y)) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE)
`geom_smooth()` using formula = 'y ~ x'

Part 3

fake_panel |>
  ggplot(aes(x, y, color = person)) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE)
`geom_smooth()` using formula = 'y ~ x'