Exercise - USDOT Airfare Dataset

In this exercise you will use panel data methods to examine data from the Domestic Airline Consumer Airfare Report, courtesy of the U.S. Department of Transportation (USDOT). For convenience, I have posted the required data on my personal website: https://ditraglia.com/data/usdot.csv. This file contains information on average airfares on different routes throughout the U.S. during between 1997 and 2000. A route is a pair of cities between which you can fly non-stop: e.g. Philadelphia-Chicago. The columns in usdot.csv are as follows:

Name	Description
`year`	Year of a given observation (1997, 1998, 1999, or 2000)
`route_id`	Unique numeric identifier for each route (pair of cities)
`distance`	Distance between the pair of cities (miles)
`passengers_daily`	Average number of passengers per day who flew this route
`airfare`	Average one-way airfare on this route (nominal U.S. $)
`market_share`	Market share of largest carrier on this route (decimal)

Consider an OLS regression of the log of airfare on market_share, log(dist), log(dist)^2, and a full set of year dummies.
1. Why might it make sense to include year dummies in this regression? Do the estimated coefficients for the time dummies make sense in light of this explanation?
2. Interpret the estimated coefficient on market_share in this regression, along with the associated 95% confidence interval based on “plain-vanilla,” i.e. non-roust, standard errors
3. Repeat (b) accounting for potential heteroskedasticity and clustering. What is the appropriate level at which to cluster in this example? How do your results change?
4. Compute the elasticity of airfare with respect to distance for each route in the dataset. Comment on your findings. Do the estimated elasticities make economic sense?
Suppose that we decided to add route fixed effects to the regression from above.
1. Why might adding route fixed effects make sense?
2. If we add route fixed effects, will we be able to estimate the elasticity of airfare with respect to distance? Why or why not?
3. Carry out the fixed effects regression, clustering standard errors appropriately. How do your results compare to those from above? Suggest a possible explanation for any differences that you find.
Suppose that we instead decided to estimate a random effects specification, but with a slight twist: in addition to the regressors from part 1 above we add the time average of market_share. This is called a correlated random effects model and is usually attributed to Mundlak. (See Baltagi (2006) for details.) To be clear, in this specification both market_share and its time average within route will appear in the regression.
1. Can we estimate the elasticity of airfares with respect to distance in this specification? Why or why not?
2. Estimate the random effects model described above. For simplicity, use the “default” random effects standard errors.
3. Discuss your findings. How to the point estimates from this specification compare to those from above?