Problem Set - Was Weber wrong?

This question is based on Becker and Woessmann (2009) (BW). To answer it you will need to consult the paper. You will also need a copy of the dataset weber.csv, a cleaned version of the 1871 Prussian census. The dataset is available from my website in .csv format at https://ditraglia.com/data/weber.csv.

Mini project plug: Micro-regional data collected by the Royal Prussian Statistical Office covering 1816-1901 have been now been digitized and structured in the ifo Prussian Economic History Database (iPEHD). They are available without access restrictions. Have a browse!

Here is a description of the variables from weber.csv you’ll use for your replication:

Name Description
f_rw Literacy rate
f_prot % Protestants
kmwittenberg Distance from Wittenberg (km)
f_young % age below 10
f_jew % Jews
f_fem % Women
f_ortsgeb % born in municipality
f_pruss % of Prussian origin
hhsize Average household size in county
lnpop ln(population size)
gpop Popul. growth in 1867-1871 (in %)
f_miss % missing information on literacy
f_blind % blind
f_deaf % deaf
f_dumb % mentally ill or disabled1

The most important variables are f_rw, which is the outcome variable (\(Y\)), f_prot which is the regressor of interest (\(X\)), and kmwittenberg, which BW propose as an instrumental variable (\(Z\)) for f_prot.

Moreover, because BW do not not report robust standard errors, feel free to use “plain vanilla” standard errors throughout.2

Exercises

  1. Read the abstract, introduction and conclusion of BW and answer the following:
    1. What is the key question that BW try to answer?
    2. What is the mechanism that BW propose to explain higher economic prosperity in protestant regions? How does this relate to Weber’s hypothesis? Was Weber wrong?
    3. For \(Z\) to be a valid instrument, it must satisfy two assumptions: relevance and exogeneity. Define these mathematically and explain briefly what these assumptions mean in the context of BW. Can either of them be checked using the available data?
    4. Consult Sections V.A and V.B. Why does distance from Wittenberg matter for the spread of Protestantism? What corroborating evidence do BW offer to argue for the exogeneity of their instrument?
  2. The remainder of this problem set focuses on the link between Protestantism and literacy. Replicate the OLS regression:
    1. Regress f_rw on f_prot and store the result in an object called ols.
    2. Display the results of part (a) in a cleanly formatted regression table, using appropriate R packages. Do your results match column (1) of Table II?
    3. Discuss your results from (b) in light of your readings from BW. Can we interpret the results of ols causally? List potential sources of endogeneity in the relationship between Protestantism and education. Why would there be a “negative selection” (p. 557)? In which direction would you expect OLS estimates to be biased? You may wish to consult Section V to answer this question.
  3. IV Regression:
    1. Estimate the first-stage regression of f_prot on kmwittenberg and store your results in an object called first_stage. Display and discuss your findings. Does kmwittenberg appear to be a relevant instrument for f_prot?
    2. Estimate the reduced-form regression of f_rw on kmwittenberg and store your results in an object called reduced_form. Display and discuss your findings.
    3. Use the ivreg() function from ivreg to carry out an IV regression of f_rw on f_prot using kmwittenberg as an instrument for f_prot and store your results in an object called iv.
    4. Display your results from iv. What is the difference in literacy rate between an all-Protestant and an all-Catholic county? How does this compare to the results of ols?
    5. Verify that you get the same estimate as in part (d) by running IV “by hand” using first_stage and reduced_form. Hint: Use the coef() function and think about how the reduced stage is obtained by substituting the first into the second stage, and combine the coefficients appropriately.
  4. Fully specified model: Now add demographic controls to replicate Column (2) of Table II and Columns (1) and (2) in Table III.
    1. Repeat Question 2 including f_young, f_jew, f_fem, f_ortsgeb, f_pruss, hhsize, lnpop, gpop, f_miss, f_blind, f_deaf and f_dumb as additional regressors.
    2. Repeat Question 3 (a) and (c) including the additional regressors. Hint: Treat them as exogenous for (c). This means we will not need an instrument for these variable: instead they serve as their own instrument. See “Details” in the help file for ivreg() to see how to specify this.
    3. Display your results in a cleanly formatted regression table using modelsummary(). Do they match Column (2) of Table II and Columns (1) and (2) in Table III? Does your interpretation change for the fully specified model?
    4. Compare the OLS and IV results in the fully specified model. Does this confirm the hypothesis of negative selection into protestantism?

Footnotes

  1. The 1871 census classifies disabled people into “blind”, “deaf-mute”, and “stupids and lunatics”. The variable names in BW reflect this classification. I’ve used contemporary language for the variable descriptions.↩︎

  2. BW mention that results are not qualitatively different for a logit-transformed dependent variable and clustered standard errors.↩︎