```{r, include = FALSE}
knitr::opts_chunk$set(comment=NA, fig.width=5, fig.height=3.5, fig.align = 'center')
```
# Part I: Handling Non-linearity
A problem with the linear RD model that we used in our last lab is that it can be "fooled" by a non-linear trend.
As we discussed last time, RD relies on the fact that any discontinuity in the relationship between $X$ and $Y$ at the cutoff $c$ indicates a causal effect of $D$ on $Y$.
This is because we assume that both $\mathbb{E}[Y_0|X]$ and $\mathbb{E}[Y_1|X]$ are continuous when $X$ is close to $c$.
A rapid change in $Y$ near the cutoff is *not* the same thing as a discontinuity and provides no evidence of a causal effect.
But a linear RD model can have a hard time distinguishing such a non-linear relationship from a genuine discontinuity.
You will explore this problem along with a simple solution to it in the following exercise.
# Exercise 1
This exercise relies on the following simulation code:
```{r}
set.seed(1234)
n <- 100
x <- runif(n)
y <- pnorm(x, 0.5, 0.1) + rnorm(n, sd = 0.1)
D <- 1 * (x >= 0.5)
```
(a) What is the RD cutoff in the simulation design?
(b) What are $E[Y_0|X]$ and $E[Y_1|X]$ in the simulation design?
(c) What is the true value of the RD causal effect in the simulation design?
(d) Use the data from the simulation experiment to fit a linear RD model. Summarize your results. Do you find evidence of a causal effect of $D$ on $Y$? Calculate a 95\% confidence interval for this effect.
(e) Make a plot of your results from (d) along with $E[Y_0|X]$ and $E[Y_1|X]$. Comment on your findings.
(f) Building on your derivations in Lab 17, figure out how to fit a *quadratic* RD model in R; rather than fitting two different linear relationships, fit two different *quadratic* relationships.
(g) How do your results from (f) change if you use a quadratic rather than linear RD specification?
# Solutions
*Write your code and solutions here.*
# Part II: Empirical Exercise
In this part you will apply what you have learned above to data from the MLDA example from MM.
Before beginning the following exercises, first download the MLDA dataset `AEJfigs.dta` from the [Mastering 'Metrics website](http://masteringmetrics.com/resources) under "Chapter 4" and use an appropriate package to convert this file and load it in R.
The only two variables you will need for your analysis are `agecell` which gives age in years (with a decimal point since the ages are binned) and `all` which gives mortality rates per 100,000 individuals.
# Exercise
(a) Use a linear RD model to estimate the causal effect of legal access to alcohol on death rates. Plot your results and carry out appropriate present appropriate statistical inference. Discuss your findings.
(b) Repeat (a) using a *quadratic* rather than linear specification. Compare and contrast your findings.
(c) RD analysis is fundamentally *local* in nature: the mortality rates of individuals far from the cutoff should not inform us about the causal effect for 21 year olds. Check the sensitivity of your results from parts (a) and (b) by restricting your sample to ages between 20 and 22, inclusive. Discuss your findings.
# Solutions
*Write your code and solutions here.*