Problem Set - The Minimum Wage and Unemployment

Replicating Card & Krueger (1994)

This problem uses a dataset called minwage.dta, drawn from a famous study of the effects of minimum wages by Card & Krueger (1994: AER). You can download a copy from the data directory of my website at https://ditraglia.com/data/minwage.dta. This minwage.dta dataset contains information collected from fast food restaurants in New Jersey (NJ) and eastern Pennsylvania (PA) during two interview waves: the first in March of 1992 and the second in November-December of the same year. Between these two interview waves – on April 1st to be precise – the New Jersey minimum wage increased by just under 19%, from $4.25 to $5.05 per hour. The minimum wage in Pennsylvania was unchanged during this period: $4.25 per hour. In the exercises that follow, you’ll apply a difference-in-differences approach to this dataset to explore the effects of raising the minimum wage.

Data

Here is a description of the variables from minwage.dta that you will need to complete the problem. When you see a pair of variables in the table below, e.g. fte / fte2, both measure the same thing but the one with the 2 is based on the second survey wave, while the one without the 2 is based on the first survey wave. Each row corresponds to a restaurant:

Name	Description
`state`	Dummy variable = 1 for NJ, = 0 for PA
`wage_st` / `wage_st2`	Starting wage in dollars/hour at the restaurant
`fte` / `fte2`	Full-time equiv. employment = #(Full time employees) + #(Part-time Employees)/2. Excludes managers.
`chain`	Categorical variable taking values in $\{1, 2, 3, 4\}$ to indicate the four chains in the dataset: Burger King, KFC, Roy Rogers, and Wendy’s
`co_owned`	Dummy variable = 1 if restaurant is company-owned, = 0 if franchised
`sample`	Dummy variable = 1 if wage and employment data are available for both survey waves at this restaurant

Questions

Question 1: Preliminaries

Read the introduction and conclusion of the paper. What is the research question? What do the authors find? Briefly, how do the authors interpret their findings? Hint: To answer the last question, read Section VIII.
Download the data and load it in R using an appropriate package.
Restrict the sample to only those restaurants with sample equal to 1 to ensure that we are making an apples-to-apples comparison throughout the remainder of this exercise.
Rename the column state to treat.
Create a new column called state that equals PA if treat is 0 and NJ if treat is 1.
Create a column called low_wage that takes the value 1 if wage_st is less than 5.

Question 2: Baseline Diff-in-Diff Estimate for full-time equivalent employment

Calculate the average full-time equivalent employment in each survey wave separately for each state.
Calculate the within-state time-differences based on (a). Why does employment decline in Pennsylvania?
Calculate the between-state difference-in-differences based on (b).
Interpret your findings from (c). What do they tell us about the causal effect of increasing the minimum wage? Define the causal effect. Name the assumptions required for this interpretation to be valid.

Question 3: Reshaping data

Reshape minwage for Diff-in-Diff regression estimation. You should end up with a tibble called both_waves with a row for each restaurant-time period combination. In addition to the columns state, treat, fte, chain, co_owned, low_wage, it should include a restaurant id variable and a dummy variable called post that indicates whether the observation is from before or after the minimum wage increase in NJ.

Question 4: Diff-in-Diff Regression Estimates

Consider the following regression model using the variables treat and post constructed above: \[ Y_{i,s,t} = \beta_0 + \beta_1 (\texttt{treat}_{i,s}) + \beta_2 (\texttt{post}_t) + \beta_3 (\texttt{treat}_{i,s} \times \texttt{post}_t) + \epsilon_{i,s,t} \] where $i$ indexes restaurants, $s$ indexes states, and $t$ indexes time periods, i.e. the two survey waves. Explain the meaning of each of the four regression coefficients. Which one gives the regression differences-in-differences effect?
Estimate the regression from part (a) based on both_waves using fte as the outcome variable. If you choose to cluster, at what level do you cluster and why? Summarize your results, including appropriate statistical inference. How do they compare to those that you calculated in question 2 above?
An advantage of the regression-based formulation of differences-in-differences is that it allows us to control for other variables that might affect employment. Repeat part (b) adding co_owned and dummy variables for each of the four restaurant chains to your regression. Display both regressions in a nicely formatted table.
How do your results from (c) compare with those from (b)?

Question 5: Probing the Diff-in-Diff Assumptions

What assumptions are required for the diff-in-diff approach to provide a valid causal estimate of the effects of New Jersey raising its minimum wage? Briefly, how do the authors argue for the validity of these assumptions?
An alternative to the comparison of NJ and PA restaurants is a within NJ comparison. The key insight here is that only restaurants with starting wages below $5 per hour in the first wave will be affected by the change in minimum wages. Use the variable low_wage to run this alternative to the regression from 4(a) using only observations from NJ. Discuss your findings.
What assumptions are needed for the diff-in-diff estimate from (b) to be reliable? How plausible is this assumption compared to the assumption from (a)?
Repeat part (b) but restrict attention to restaurants in PA where there was no change in minimum wages. Discuss your findings. What do these results suggest about the plausibility of the diff-in-diff assumptions required for part (b)?