Exercise - NSW Experiment
This problem is based on Dehejia & Wahba (2002; ReStat) “Propensity Score Matching Methods for Nonexperimental Causal Studies.” To answer it you will need two datasets: nsw_dw.dta
and cps_controls.dta
. Both of these are available online from https://users.nber.org/~rdehejia/data/
. In answering the following questions you may find it helpful to consult the associated paper. While we will not replicate the whole paper, you will be able to compare some of your results to theirs. The file nsw_dw.dta
contains experimental data from the “National Supported Work (NSW) Demonstration, a program that randomly assigned eligible workers to either receive on-the-job training (treatment) or not (control). The dataset contains observations for 445 men, of whom 185 were assigned to the treatment group and 260 were assigned to the control group:
Name | Description |
---|---|
treat |
Dummy variable: 1 denotes treated, 0 control |
age |
self-explanatory |
education |
years of education |
black |
dummy variable: 1 denotes black |
hispanic |
dummy variable: 1 denotes hispanic |
married |
dummy variable: 1 denotes married, 0 unmarried |
nodegree |
dummy variable: 1 denotes no high school degree |
re74 |
real earnings in 1974 (pre-treatment) |
re75 |
real earnings in 1975 (pre-treatment) |
re78 |
real earnings in 1978 (post-treatment) |
The file cps_controls.dta
contains observations of the same variables for 15,992 men who were not subjects in the NSW experiment. This information is drawn from the Current Population Survey (CPS). Because none of these men were in the experiment, none of them received the treatment. Hence treat
equals zero for all of them.
Below we will compare treatment effect estimates from the experimental data from the NSW with alternatives constructed by applying regression adjustment and propensity score weighing applied to a “composite” sample that includes treated individuals from the NSW and untreated individuals from the CPS. Here’s the idea. The NSW was a randomized controlled trial, so we can easily compute an unbiased estimate of the ATE. There’s no need for selection-on-observables assumptions, valid instruments, or any other clever identification strategies. But in many real-world situations observational data are all that we have to work with. Can we somehow use the NSW data to see how well various observational methods would have performed compared to the experimental ideal? Here’s a possible way to accomplish this. The problem of causal inference is one of constructing a valid control group. How would our causal effect estimates change if we replaced the real NSW control group with a “fake” control group constructed from the CPS using statistical modeling? The challenge is that NSW participants were not a random sample from the US population, whereas CPS respondents were.
- Data Cleaning:
- Load the experimental data from
nsw_dta.dta
and store it in a tibble calledexperimental
. - Rename
re74
toearnings74
and do the same forre75
andre78
. - Convert the dummy variables
black
andhispanic
into a single character variablerace
that takes on the valueswhite
,black
, orhispanic
. Hint: usecase_when()
fromdplyr
. - Convert
treat
,degree
andmarriage
from dummy variables to character variables. Choose meaningful names so that each becomes self-explanatory. (E.g. a binarysex
dummy becomes a character variable that is eithermale
orfemale
.) - Earnings of zero in a particular year indicate that a person was unemployed. Use this fact to create two variables:
employment74
andemployment75
. Each of these should be a character variable that takes on the valueemployed
orunemployed
to indicate a person’s employment status in 1974 and 1975. - Drop any variables that have become redundant in light of the steps you carried out above. You can also drop
data_id
since it takes on the same value for every observation in this dataset.
- Load the experimental data from
- Experimental Results:
- Use
datasummary_skim()
to make two tables of summary statistics forexperimental
: one for the numerical variables and another for the categorical ones. - Use
datasummary_balance()
to make a table that compares average values of the variables inexperimental
across treatment and control groups. Comment on the results. - Construct an approximate 95% confidence interval for the average treatment effect of the NSW program based on the data from
experimental
. Interpret your findings.
- Use
- Construct the Composite Sample:
- Load the CPS data from
cps_controls.dta
and store it in a tibble calledcps_controls
. - Clean
cps_controls
using the same steps that you applied toexperimental
above. (Consider writing a function so you don’t have to do the same thing twice!) - Use
bind_rows()
fromdplyr
to create a tibble calledcomposite
that includes all of the individuals fromcps_controls
and only the treated individuals fromexperimental
. - Use
datasummary_balance()
to compare the two groups incomposite
: the treatment group from the NSW and the “controls” from the CPS. - Comment on your findings. What, if anything, does the difference of mean outcomes between the two groups in
composite
tell us about the treatment effect of interest?
- Load the CPS data from
- Regression Adjustment:
- Regress 1978 earnings on the other variables in
composite
and display the results. - Explain in detail how, and under what assumptions, the regression from the preceding part can be used to estimate the treatment effect of interest. If we assume that these assumptions hold, what is our estimate? How does it compare to the experimental results from above?
- Regress 1978 earnings on the other variables in
- Propensity Score Weighting
- Run a logistic regression to estimate the propensity score using the data from
composite
. Becauseglm()
requires a numeric outcome variable, I suggest creating a tibble calledlogit_data
that makes the necessary adjustments beforehand. Think carefully about which variables to include. You don’t necessarily have to match the precise specification that the authors use in their paper (although you can if you like: see note A from Table 2) but there is one variable incomposite
that definitely should not be included in your model. Which one is it and why? - Make two histograms of your estimated propensity scores from the preceding part: one for the treated individuals and one for the untreated. What do your results suggest? (Feel free to make additional plots or compute additional summary statistics to support your argument.)
- Calculate the propensity score weighting estimator. You should obtain a crazy result. (You should obtain a crazy result!)
- Repeat the preceding part except this time drop any observations with a propensity score less than 0.1 or greater than 0.9 before calculating the propensity score weighting estimator. (You should get a non-crazy result!)
- Find an explanation for the difference in your results between parts (c) and (d).
- Run a logistic regression to estimate the propensity score using the data from