# Exercise - The Minimum Wage and Unemployment

This problem uses a dataset called `minwage.dta`

, drawn from a famous study of the effects of minimum wages by Card & Kreuger (1994: AER). You can download a copy from the data directory of my website at `https://ditraglia.com/data/minwage.dta`

. This `minwage.dta`

dataset is contains information collected from fast food restaurants in New Jersey and eastern Pennsylvania during two interview waves: the first in March of 1992 and the second in November-December of the same year. Between these two interview waves – on April 1st to be precise – the New Jersey minimum wage increased by just under 19%, from $4.25 to $5.05 per hour. The minimum wage in Pennsylvania was unchanged during this period: $4.25 per hour. In the exercises that follow, you’ll apply a difference-in-differences approach to this dataset to explore the effects of raising the minimum wage.

Here is a description of the variables from `minwage.dta`

that you will need to complete the problem. When you see a pair of variables in the table below, e.g. `fte`

/ `fte2`

, both measure the same thing but the one with the `2`

is based on the *second* survey wave, while the one without the `2`

is based on the *first* survey wave. Each row corresponds to a restaurant:

Name | Description |
---|---|

`state` |
Dummy variable = 1 for NJ, = 0 for PA |

`wage_st` / `wage_st2` |
Starting wage in dollars/hour at the restaurant |

`fte` / `fte2` |
Full-time equiv. employment = #(Full time employees) + #(Part-time Employees)/2. Excludes managers. |

`chain` |
Categorical variable taking values in \(\{1, 2, 3, 4\}\) to indicate the four chains in the dataset: Burger King, KFC, Roy Rogers, and Wendy’s |

`co_owned` |
Dummy variable = 1 if restaurant is company-owned, =0 if franchised |

`sample` |
Dummy variable = 1 if wage and employment data are available for both survey waves at this restaurant |

- Preliminaries:
- Download the data and load it in R using an appropriate package.
- Restrict the sample to only those restaurants with
`sample`

equal to 1 to ensure that we are making an apples-to-apples comparison throughout the remainder of this exercise. - Rename the column
`state`

to`treat`

. - Create a
*new*column called`state`

that equals`PA`

if`treat`

is`0`

and`NJ`

if`treat`

is`1`

. - Create a column called
`low_wage`

that takes the value`1`

if`wage_st`

is less than`5`

.

- Baseline Diff-in-Diff Estimate: starting wages
- Calculate the average wage in each survey wave separately for each state.
- Calculate the within-state time-differences based on (a).
- Calculate the between-state difference-in-differences based on (c).
- Interpret your findings from (c). What do they tell us about the causal effect of increasing the minimum wage? What assumptions are required for this interpretation to be valid?

- Baseline Diff-in-Diff Estimate: full time equivalent employment
- Repeat question 2 but using full-time equivalent employment as the outcome variable rather than starting wages.

- Reshape
`minwage`

for Diff-in-Diff regression estimation. You should end up with a tibble called`both_waves`

with a row for each restaurant-time period combination. In addition to the columns`state`

,`treat`

,`wage_st`

,`fte`

,`chain`

,`co_owned`

, it should include a restaurant id variable and a dummy variable called`post`

that indicates whether the observation is from before or after the minimum wage increase in NJ. - Diff-in-Diff Regression Estimates:
- Consider the following regression model using the variables
`treat`

and`post`

constructed above: \[Y_{i,s,t} = \beta_0 + \beta_1 (\texttt{treat}_{i,s}) + \beta_2 (\texttt{post}_t) + \beta_3 (\texttt{treat}_{i,s} \times \texttt{post}_t) + \epsilon_{i,s,t}\] where \(i\) indexes*restaurants*, \(s\) indexes*states*, and \(t\) indexes*time periods*, i.e. the two survey waves. Explain the meaning of each of the four regression coefficients. Which one gives the Regression differences-in-differences effect? - Estimate the regression from part (a) based on
`both_waves`

using`wage_st`

as the outcome variable. Summarize your results, including appropriate statistical inference. If you choose to cluster, at what level do you cluster and why? How do your results compare to those that you calculated in question 2 above? - Estimate the regression from part (a) based on
`both_waves`

using`fte`

as the outcome variable. Summarize your results, including appropriate statistical inference. How do they compare to those that you calculated in question 3 above? - An advantage of the regression-based formulation of differences-in-differences is that it allows us to control for other variables that might affect wages and employment. Repeat parts (b) and (c) adding
`co_owned`

and dummy variables for each of the four restaurant chains to your regression. - How do your results from part (d) compare with those of parts (b) and (c)?

- Consider the following regression model using the variables
- Probing the Diff-in-Diff Assumptions:
- What assumptions are required for the diff-in-diff approach to provide a valid causal estimate of the effects of New Jersey raising its minimum wage?
- An alternative to the comparison of NJ and PA restaurants is a
*within*NJ comparison. The key insight here is that only restaurants with starting wages below $5 per hour in the first wave will be affected by the change in minimum wages. Use the variable`low_wage`

to run this alternative to the regression from 5(a) using only observations from NJ. Discuss your findings. - What assumptions are needed for the DD estimate from (b) to be reliable? How plausible is this assumption compared to the assumption from (a)?
- Repeat part (b) but restrict attention to restaurants in
`PA`

where there was no change in minimum wages. Discuss your findings. What do these results suggest about the plausibility of the diff-in-diff assumptions in part (b)?