Problem Set: Minimum Legal Drinking Age

The Data

Because access to the underlying individual mortality data is restricted, here we will work with group averages. The data you will need to complete the following is available from https://ditraglia.com/data/mlda.dta. The dataset contains multiple columns, but you’ll only need two of them. The first is agecell. This variable contains age in months, stored as a whole number of years plus a decimal. For example, agecell == 19.06849 means roughly 19 years and a month. (It’s a bit inelegant, I agree!) The second is all, which gives all-cause mortality rates per 100,000 individuals.

These variables were constructed as follows: underlying both of them is individual data on mortality. These individual data were grouped into fifty “bins” based on age in months. The average age in the bin was stored in agecell and the mortality rate in the bin was stored in all. (I provide this explanation only in case you’re curious: you won’t need to worry about it below!)

Exercises

  1. Load the data (it’s a .dta file, so you’ll want to use read_dta() from the haven package). Typeset the linear RD model and define the Conditional Average Treatment Effect (CATE).
  2. Use a linear RD model to estimate the causal effect of legal access to alcohol on death rates. Plot your results and carry out appropriate statistical inference. Interpret the CATE and discuss your findings. Hint: Consult the “Regression discontinuity” lecture slides if you need a reminder of the econometrics or the implementation in R.
  3. Repeat the preceding part using a quadratic rather than linear specification. Compare and contrast your findings.
  4. RD analysis is fundamentally local in nature: the mortality rates of individuals far from the cut-off should not inform us about the causal effect for 21 year olds. Repeat parts 2 and 3 after restricting your sample to ages between 20 and 22, inclusive, to check the sensitivity of your results. Discuss your findings.