Exercise - The Minimum Wage and Unemployment
This problem uses a dataset called minwage.dta
, drawn from a famous study of the effects of minimum wages by Card & Kreuger (1994: AER). You can download a copy from the data directory of my website at https://ditraglia.com/data/minwage.dta
. This minwage.dta
dataset is contains information collected from fast food restaurants in New Jersey and eastern Pennsylvania during two interview waves: the first in March of 1992 and the second in November-December of the same year. Between these two interview waves – on April 1st to be precise – the New Jersey minimum wage increased by just under 19%, from $4.25 to $5.05 per hour. The minimum wage in Pennsylvania was unchanged during this period: $4.25 per hour. In the exercises that follow, you’ll apply a difference-in-differences approach to this dataset to explore the effects of raising the minimum wage.
Here is a description of the variables from minwage.dta
that you will need to complete the problem. When you see a pair of variables in the table below, e.g. fte
/ fte2
, both measure the same thing but the one with the 2
is based on the second survey wave, while the one without the 2
is based on the first survey wave. Each row corresponds to a restaurant:
Name | Description |
---|---|
state |
Dummy variable = 1 for NJ, = 0 for PA |
wage_st / wage_st2 |
Starting wage in dollars/hour at the restaurant |
fte / fte2 |
Full-time equiv. employment = #(Full time employees) + #(Part-time Employees)/2. Excludes managers. |
chain |
Categorical variable taking values in \(\{1, 2, 3, 4\}\) to indicate the four chains in the dataset: Burger King, KFC, Roy Rogers, and Wendy’s |
co_owned |
Dummy variable = 1 if restaurant is company-owned, =0 if franchised |
sample |
Dummy variable = 1 if wage and employment data are available for both survey waves at this restaurant |
- Preliminaries:
- Download the data and load it in R using an appropriate package.
- Restrict the sample to only those restaurants with
sample
equal to 1 to ensure that we are making an apples-to-apples comparison throughout the remainder of this exercise. - Rename the column
state
totreat
. - Create a new column called
state
that equalsPA
iftreat
is0
andNJ
iftreat
is1
. - Create a column called
low_wage
that takes the value1
ifwage_st
is less than5
.
- Baseline Diff-in-Diff Estimate: starting wages
- Calculate the average wage in each survey wave separately for each state.
- Calculate the within-state time-differences based on (a).
- Calculate the between-state difference-in-differences based on (c).
- Interpret your findings from (c). What do they tell us about the causal effect of increasing the minimum wage? What assumptions are required for this interpretation to be valid?
- Baseline Diff-in-Diff Estimate: full time equivalent employment
- Repeat question 2 but using full-time equivalent employment as the outcome variable rather than starting wages.
- Reshape
minwage
for Diff-in-Diff regression estimation. You should end up with a tibble calledboth_waves
with a row for each restaurant-time period combination. In addition to the columnsstate
,treat
,wage_st
,fte
,chain
,co_owned
, it should include a restaurant id variable and a dummy variable calledpost
that indicates whether the observation is from before or after the minimum wage increase in NJ. - Diff-in-Diff Regression Estimates:
- Consider the following regression model using the variables
treat
andpost
constructed above: \[Y_{i,s,t} = \beta_0 + \beta_1 (\texttt{treat}_{i,s}) + \beta_2 (\texttt{post}_t) + \beta_3 (\texttt{treat}_{i,s} \times \texttt{post}_t) + \epsilon_{i,s,t}\] where \(i\) indexes restaurants, \(s\) indexes states, and \(t\) indexes time periods, i.e. the two survey waves. Explain the meaning of each of the four regression coefficients. Which one gives the Regression differences-in-differences effect? - Estimate the regression from part (a) based on
both_waves
usingwage_st
as the outcome variable. Summarize your results, including appropriate statistical inference. If you choose to cluster, at what level do you cluster and why? How do your results compare to those that you calculated in question 2 above? - Estimate the regression from part (a) based on
both_waves
usingfte
as the outcome variable. Summarize your results, including appropriate statistical inference. How do they compare to those that you calculated in question 3 above? - An advantage of the regression-based formulation of differences-in-differences is that it allows us to control for other variables that might affect wages and employment. Repeat parts (b) and (c) adding
co_owned
and dummy variables for each of the four restaurant chains to your regression. - How do your results from part (d) compare with those of parts (b) and (c)?
- Consider the following regression model using the variables
- Probing the Diff-in-Diff Assumptions:
- What assumptions are required for the diff-in-diff approach to provide a valid causal estimate of the effects of New Jersey raising its minimum wage?
- An alternative to the comparison of NJ and PA restaurants is a within NJ comparison. The key insight here is that only restaurants with starting wages below $5 per hour in the first wave will be affected by the change in minimum wages. Use the variable
low_wage
to run this alternative to the regression from 5(a) using only observations from NJ. Discuss your findings. - What assumptions are needed for the DD estimate from (b) to be reliable? How plausible is this assumption compared to the assumption from (a)?
- Repeat part (b) but restrict attention to restaurants in
PA
where there was no change in minimum wages. Discuss your findings. What do these results suggest about the plausibility of the diff-in-diff assumptions in part (b)?