Problem Set: USDOT Airfare Dataset

In this exercise you will use panel data methods to examine data from the Domestic Airline Consumer Airfare Report, courtesy of the U.S. Department of Transportation (USDOT). For convenience, I have posted the required data on my personal website: https://ditraglia.com/data/usdot.csv. This file contains information on average airfares on different routes throughout the U.S. between 1997 and 2000. A route is a pair of cities between which you can fly non-stop: e.g. Philadelphia-Chicago. The columns in usdot.csv are as follows:

Name Description
year Year of a given observation (1997, 1998, 1999, or 2000)
route_id Unique numeric identifier for each route (pair of cities)
distance Distance between the pair of cities (miles)
passengers_daily Average number of passengers per day who flew this route
airfare Average one-way airfare on this route (nominal U.S. $)
market_share Market share of largest carrier on this route (decimal)

Exercises

Exercise 1: Simple OLS model

Consider an OLS regression of the log of airfare on market_share, log(dist), log(dist)^2, and a full set of year dummies.

  1. Why might it make sense to include year dummies in this regression? Do the estimated coefficients for the time dummies make sense in light of this explanation?
  2. Interpret the estimated coefficient on market_share in this regression, along with the associated 95% confidence interval based on “plain-vanilla,” i.e. non-roust, standard errors.
  3. Repeat (b) accounting for potential heteroskedasticity and clustering. What is the appropriate level at which to cluster in this example? How do your results change?

Exercise 2: Airfare elasticities

  1. Recall that elasticities measure the sensitivity of one variable to another. The elasticity of \(Y\) wrt \(X\) is given by \(\frac{d Y/Y}{dX/X}\). Compute the derivative of the logarithm, \(\frac{d\log(Y)}{dY}\), rearrange, and express the formula for elasticity in terms of logarithms. Hint: You formula should have only logarithms in it!
  2. What is the elasticity of airfare with respect to distance? Use your formula from (a) to derive an expression for the elasticity in your log-log model. Hint: Your formula should give the elasticity depending on model coefficients and distance.
  3. Use this expression to add the variable elasticity to your data set. elasticity should contain airfare elasticities for each route. Visualise this variable in a figure.
  4. Comment on your findings. Do the estimated elasticities make economic sense?

Exercise 3: Route fixed effects

Suppose that we decided to add route fixed effects to the regression from above.

  1. What inference issue do fixed effects solve? Why might adding route fixed effects make sense here?
  2. Typeset the estimation equation. If we add route fixed effects, will we be able to estimate the elasticity of airfare with respect to distance? Why or why not? Hint: This depends on whether we can include time-invariant regressors in our model. Can we do that in fixed effects?
  3. Carry out the fixed effects regression, clustering standard errors appropriately. How do your results compare to those from above? Suggest a possible explanation for any differences that you find.

Exercise 4: Correlated random effects

Suppose that we instead decided to estimate a random effects specification, but with a slight twist: in addition to the regressors from part 1 above, we add the time average of market_share. This is called a correlated random effects model and is usually attributed to Mundlak. (See Baltagi (2006) for details.) To be clear, in this specification both market_share and its time average within route will appear in the regression. This specification assumes that individual effects are a linear function of the average of market_share across time.

  1. Think back to Core Metrics: what is the benefit of random effects over fixed effects? What assumption do we need for random effects? How does the Mundlak specification relax this assumption?
  2. Typeset the estimation equation. Can we estimate the elasticity of airfares with respect to distance in this specification? Why or why not?
  3. Estimate the random effects model described above. For simplicity, use the “default” random effects standard errors.
  4. Discuss your findings. How to the point estimates from this specification compare to those from above? Why? Hint: Consult Baltagi (2006) if you’re stuck, or go back to your Core Metrics notes on theta-differencing.