Exercise - NSW Experiment

This problem is based on Dehejia & Wahba (2002; ReStat) “Propensity Score Matching Methods for Nonexperimental Causal Studies.” To answer it you will need two datasets: nsw_dw.dta and cps_controls.dta. Both of these are available online from https://users.nber.org/~rdehejia/data/. In answering the following questions you may find it helpful to consult the associated paper. While we will not replicate the whole paper, you will be able to compare some of your results to theirs. The file nsw_dw.dta contains experimental data from the “National Supported Work (NSW) Demonstration, a program that randomly assigned eligible workers to either receive on-the-job training (treatment) or not (control). The dataset contains observations for 445 men, of whom 185 were assigned to the treatment group and 260 were assigned to the control group:

Name Description
treat Dummy variable: 1 denotes treated, 0 control
age self-explanatory
education years of education
black dummy variable: 1 denotes black
hispanic dummy variable: 1 denotes hispanic
married dummy variable: 1 denotes married, 0 unmarried
nodegree dummy variable: 1 denotes no high school degree
re74 real earnings in 1974 (pre-treatment)
re75 real earnings in 1975 (pre-treatment)
re78 real earnings in 1978 (post-treatment)

The file cps_controls.dta contains observations of the same variables for 15,992 men who were not subjects in the NSW experiment. This information is drawn from the Current Population Survey (CPS). Because none of these men were in the experiment, none of them received the treatment. Hence treat equals zero for all of them.

Below we will compare treatment effect estimates from the experimental data from the NSW with alternatives constructed by applying regression adjustment and propensity score weighing applied to a “composite” sample that includes treated individuals from the NSW and untreated individuals from the CPS. Here’s the idea. The NSW was a randomized controlled trial, so we can easily compute an unbiased estimate of the ATE. There’s no need for selection-on-observables assumptions, valid instruments, or any other clever identification strategies. But in many real-world situations observational data are all that we have to work with. Can we somehow use the NSW data to see how well various observational methods would have performed compared to the experimental ideal? Here’s a possible way to accomplish this. The problem of causal inference is one of constructing a valid control group. How would our causal effect estimates change if we replaced the real NSW control group with a “fake” control group constructed from the CPS using statistical modeling? The challenge is that NSW participants were not a random sample from the US population, whereas CPS respondents were.

  1. Data Cleaning:
    1. Load the experimental data from nsw_dta.dta and store it in a tibble called experimental.
    2. Rename re74 to earnings74 and do the same for re75 and re78.
    3. Convert the dummy variables black and hispanic into a single character variable race that takes on the values white, black, or hispanic. Hint: use case_when() from dplyr.
    4. Convert treat, degree and marriage from dummy variables to character variables. Choose meaningful names so that each becomes self-explanatory. (E.g. a binary sex dummy becomes a character variable that is either male or female.)
    5. Earnings of zero in a particular year indicate that a person was unemployed. Use this fact to create two variables: employment74 and employment75. Each of these should be a character variable that takes on the value employed or unemployed to indicate a person’s employment status in 1974 and 1975.
    6. Drop any variables that have become redundant in light of the steps you carried out above. You can also drop data_id since it takes on the same value for every observation in this dataset.
  2. Experimental Results:
    1. Use datasummary_skim() to make two tables of summary statistics for experimental: one for the numerical variables and another for the categorical ones.
    2. Use datasummary_balance() to make a table that compares average values of the variables in experimental across treatment and control groups. Comment on the results.
    3. Construct an approximate 95% confidence interval for the average treatment effect of the NSW program based on the data from experimental. Interpret your findings.
  3. Construct the Composite Sample:
    1. Load the CPS data from cps_controls.dta and store it in a tibble called cps_controls.
    2. Clean cps_controls using the same steps that you applied to experimental above. (Consider writing a function so you don’t have to do the same thing twice!)
    3. Use bind_rows() from dplyr to create a tibble called composite that includes all of the individuals from cps_controls and only the treated individuals from experimental.
    4. Use datasummary_balance() to compare the two groups in composite: the treatment group from the NSW and the “controls” from the CPS.
    5. Comment on your findings. What, if anything, does the difference of mean outcomes between the two groups in composite tell us about the treatment effect of interest?
  4. Regression Adjustment:
    1. Regress 1978 earnings on the other variables in composite and display the results.
    2. Explain in detail how, and under what assumptions, the regression from the preceding part can be used to estimate the treatment effect of interest. If we assume that these assumptions hold, what is our estimate? How does it compare to the experimental results from above?
  5. Propensity Score Weighting
    1. Run a logistic regression to estimate the propensity score using the data from composite. Because glm() requires a numeric outcome variable, I suggest creating a tibble called logit_data that makes the necessary adjustments beforehand. Think carefully about which variables to include. You don’t necessarily have to match the precise specification that the authors use in their paper (although you can if you like: see note A from Table 2) but there is one variable in composite that definitely should not be included in your model. Which one is it and why?
    2. Make two histograms of your estimated propensity scores from the preceding part: one for the treated individuals and one for the untreated. What do your results suggest? (Feel free to make additional plots or compute additional summary statistics to support your argument.)
    3. Calculate the propensity score weighting estimator. You should obtain a crazy result. (You should obtain a crazy result!)
    4. Repeat the preceding part except this time drop any observations with a propensity score less than 0.1 or greater than 0.9 before calculating the propensity score weighting estimator. (You should get a non-crazy result!)
    5. Find an explanation for the difference in your results between parts (c) and (d).