# Exercise A - (20 min)

Download fake-panel-data.csv from https://ditraglia.com/data. This dataset was simulated according to the one-way error components model described above. It contains six columns: person is a unique person identifier (name), year is a year index (1-5), x and y are the regressor and outcome variable, and epsilon and eta are the error terms. (In real data you wouldn’t have the errors, but this is a simulation!)

1. Use lm to regress y on x with “classical” standard errors. Repeat with standard errors clustered by person using lm_robust(). Discuss your results.
2. Plot y against x along with the regression line from part 1.
3. Repeat 2, but use a different color for the points that correspond to each person in the dataset and plot a separate regression line for each person.
4. What does the plot you made in part 3 suggest? Use the columns epsilon and eta to check your conjecture.
5. Finally, use lm_robust() to regress y on x and a dummy variable for each person, clustering the standard errors by person. Discuss your results.

## Solution

### Part 1

library(tidyverse)
library(estimatr)
library(modelsummary)

reg_classical <- lm(y~ x, fake_panel)
reg_cluster <- lm_robust(y ~ x, fake_panel, clusters = person)

modelsummary(list(Classical = reg_classical,
Clustered = reg_cluster),
gof_omit = 'AIC|BIC|F|RMSE|R2|Log.Lik.')
Classical Clustered
(Intercept) 1.134 1.134
(0.196) (0.398)
x 1.345 1.345
(0.378) (0.595)
Num.Obs. 50 50
Std.Errors by: person

### Part 2

fake_panel |>
ggplot(aes(x, y)) +
geom_point() +
geom_smooth(method = 'lm', se = FALSE)
geom_smooth() using formula = 'y ~ x'

### Part 3

fake_panel |>
ggplot(aes(x, y, color = person)) +
geom_point() +
geom_smooth(method = 'lm', se = FALSE)
geom_smooth() using formula = 'y ~ x'