Problem Set - NSW Experiment

This problem is based on Dehejia & Wahba (2002; ReStat) “Propensity Score Matching Methods for Nonexperimental Causal Studies.” To answer it you will need two datasets: nsw_dw.dta and cps_controls.dta. Both of these are available online from https://users.nber.org/~rdehejia/data/. In answering the following questions you may find it helpful to consult the associated paper. While we will not replicate the whole paper, you will be able to compare some of your results to theirs.

NSW Data

The file nsw_dw.dta contains experimental data from the “National Supported Work (NSW) Demonstration, a program that randomly assigned eligible workers to either receive on-the-job training (treatment) or not (control). The dataset contains observations for 445 men, of whom 185 were assigned to the treatment group and 260 were assigned to the control group:

Name	Description
`treat`	Dummy variable: `1` denotes treated, `0` control
`age`	self-explanatory
`education`	years of education
`black`	dummy variable: `1` denotes black
`hispanic`	dummy variable: `1` denotes hispanic
`married`	dummy variable: `1` denotes married, `0` unmarried
`nodegree`	dummy variable: `1` denotes no high school degree
`re74`	real earnings in 1974 (pre-treatment)
`re75`	real earnings in 1975 (pre-treatment)
`re78`	real earnings in 1978 (post-treatment)

CPS Data

The file cps_controls.dta contains observations of the same variables for 15,992 men who were not subjects in the NSW experiment. This information is drawn from the Current Population Survey (CPS). Because none of these men were in the experiment, none of them received the treatment. Hence treat equals zero for all of them.

Problem set goals

You will investigate how regression adjustment and propensity score weighting performs compared to the RCT gold standard for the NSW data. We will compare treatment effect estimates from the NSW experimental data with alternatives constructed with regression adjustment and propensity score weighing applied to a “composite” sample that includes treated individuals from the NSW and untreated individuals from the CPS.

Here’s the key idea: the NSW was a randomized controlled trial, so we can easily compute an unbiased estimate of the ATE. There’s no need for selection-on-observables assumptions, valid instruments, or any other clever identification strategies. But in many real-world situations observational data are all that we have to work with. Can we somehow use the NSW data to see how well various observational methods would have performed compared to the experimental ideal? Here’s a possible way to accomplish this. The problem of causal inference is one of constructing a valid control group. How would our causal effect estimates change if we replaced the real NSW control group with a “fake” control group constructed from the CPS using statistical modelling? The challenge is that NSW participants were not a random sample from the US population, whereas CPS respondents were.

Exercises

Exercise 1: Data Cleaning for the NSW data

Load the experimental data from nsw_dta.dta and store it in a tibble called experimental.
Rename re74 to earnings74 and do the same for re75 and re78.
Convert the dummy variables black and hispanic into a single character variable race that takes on the values white, black, or hispanic. Hint: use case_when() from dplyr.
Convert treat, degree and marriage from dummy variables to character variables. Choose meaningful names so that each becomes self-explanatory. (E.g. a binary sex dummy becomes a character variable that is either male or female.)
Earnings of zero in a particular year indicate that a person was unemployed. Use this fact to create two variables: employment74 and employment75. Each of these should be a character variable that takes on the value employed or unemployed to indicate a person’s employment status in 1974 and 1975.
Drop any variables that have become redundant in light of the steps you carried out above. You can also drop data_id since it takes on the same value for every observation in this dataset.

Hint: You will later clean the CPS data in the same way. After you’ve successfully implemented all the data cleaning steps above, rewrite your code by defining the function cleanup() that cleans the raw data. You can reuse this function later.

Exercise 2: Experimental Results

Use datasummary_skim() to make two tables of summary statistics for experimental: one for the numerical variables and another for the categorical ones.
Use datasummary_balance() to make a table that compares average values of the variables in experimental across treatment and control groups. Comment on the results.
Construct an approximate 95% confidence interval for the average treatment effect of the NSW program based on the data from experimental. Interpret your findings.

Exercise 3: Construct the Composite Sample from the NSW and CPS Data

Load the CPS data from cps_controls.dta and store it in a tibble called cps_controls.
Clean cps_controls using the same steps that you applied to experimental above. Hint: This is your chance to reuse the cleanup() function from exercise 1!
Use bind_rows() from dplyr to create a tibble called composite that includes all of the individuals from cps_controls and only the treated individuals from experimental. Hint: First, figure out how to filter out all the treated individuals from experimental. Then combine these observations with cps_controls using bind_rows().
Use datasummary_balance() to compare the two groups in composite: the treatment group from the NSW and the “controls” from the CPS.
Comment on your findings. What, if anything, does the difference of mean outcomes between the two groups in composite tell us about the treatment effect of interest?

Exercise 4: Regression Adjustment

Hint: For answering this question, you may find it helful to consult my blog post on regression adjustment.

Regress 1978 earnings on the other variables in composite. Display and interpret the results. How do they compare to the experimental results from above?
Under what assumptions does the coefficient on the treatment indicator from your regression in part (a) estimate the ATE? Explain briefly.
Suppose that the assumptions required for regression adjustment hold–see the lecture recording for details. Suppose further that \(\mathbb{E}[Y_d|X] = \alpha_d + X'\beta_d\) for \(d = 0,1\) where \(X\) excludes an intercept. Explain how you could use regression adjustment to estimate the ATE under these assumptions. Would your approach differ from the one you used in part (a)? If so, how? Hint: use the assumptions about \(\mathbb{E}[Y_d|X]\) to derive an expression for \(\mathbb{E}[Y|D,X]\) and relate this to the ATE. To be clear: you do need to explain what regression you would run, but you do not actually have to run it.

Exercise 5: Propensity Score Weighting

Run a logistic regression to estimate the propensity score for each worker with the composite data using glm(). Because glm() requires a numeric outcome variable, I suggest creating a tibble called logit_data that makes the necessary adjustments beforehand. You can use this tibble to fit your logit model, and finally to estimate workers’ propensity scores using augment(). Think carefully about which variables to include in your logit model. You don’t have to match the precise specification that Dehejia & Wahba (2002; ReStat) use in their paper (although you can if you like: see note A from Table 2) but there is one variable in composite that definitely should not be included. Which one is it and why?
Make two histograms of your estimated propensity scores: one for the treated individuals and one for the untreated. What do your results suggest? (Feel free to make additional plots or compute additional summary statistics to support your argument.)
Calculate the propensity score weighting estimator. You should obtain a crazy result. (You should obtain a crazy result!)
Repeat the preceding part except this time drop any observations with a propensity score less than 0.1 or greater than 0.9 before calculating the propensity score weighting estimator. (You should get a non-crazy result!)
Find an explanation for the difference in your results between parts (c) and (d).