Problem Set - Was Weber wrong?
This question is based on Becker and Woessmann (2009) (BW). To answer it you will need to consult the paper. You will also need a copy of the dataset weber.csv
, a cleaned version of the 1871 Prussian census. The dataset is available from my website in .csv format at https://ditraglia.com/data/weber.csv
.
Mini project plug: Micro-regional data collected by the Royal Prussian Statistical Office covering 1816-1901 have been now been digitized and structured in the ifo Prussian Economic History Database (iPEHD). They are available without access restrictions. Have a browse!
Here is a description of the variables from weber.csv
you’ll use for your replication:
Name | Description |
---|---|
f_rw |
Literacy rate |
f_prot |
% Protestants |
kmwittenberg |
Distance from Wittenberg (km) |
f_young |
% age below 10 |
f_jew |
% Jews |
f_fem |
% Women |
f_ortsgeb |
% born in municipality |
f_pruss |
% of Prussian origin |
hhsize |
Average household size in county |
lnpop |
ln(population size) |
gpop |
Popul. growth in 1867-1871 (in %) |
f_miss |
% missing information on literacy |
f_blind |
% blind |
f_deaf |
% deaf |
f_dumb |
% mentally ill or disabled1 |
The most important variables are f_rw
, which is the outcome variable (\(Y\)), f_prot
which is the regressor of interest (\(X\)), and kmwittenberg
, which BW propose as an instrumental variable (\(Z\)) for f_prot
.
Moreover, because BW do not not report robust standard errors, feel free to use “plain vanilla” standard errors throughout.2
Exercises
- Read the abstract, introduction and conclusion of BW and answer the following:
- What is the key question that BW try to answer?
- What is the mechanism that BW propose to explain higher economic prosperity in protestant regions? How does this relate to Weber’s hypothesis? Was Weber wrong?
- For \(Z\) to be a valid instrument, it must satisfy two assumptions: relevance and exogeneity. Define these mathematically and explain briefly what these assumptions mean in the context of BW. Can either of them be checked using the available data?
- Consult Sections V.A and V.B. Why does distance from Wittenberg matter for the spread of Protestantism? What corroborating evidence do BW offer to argue for the exogeneity of their instrument?
- The remainder of this problem set focuses on the link between Protestantism and literacy. Replicate the OLS regression:
- Regress
f_rw
onf_prot
and store the result in an object calledols
. - Display the results of part (a) in a cleanly formatted regression table, using appropriate R packages. Do your results match column (1) of Table II?
- Discuss your results from (b) in light of your readings from BW. Can we interpret the results of
ols
causally? List potential sources of endogeneity in the relationship between Protestantism and education. Why would there be a “negative selection” (p. 557)? In which direction would you expect OLS estimates to be biased? You may wish to consult Section V to answer this question.
- Regress
- IV Regression:
- Estimate the first-stage regression of
f_prot
onkmwittenberg
and store your results in an object calledfirst_stage
. Display and discuss your findings. Doeskmwittenberg
appear to be a relevant instrument forf_prot
? - Estimate the reduced-form regression of
f_rw
onkmwittenberg
and store your results in an object calledreduced_form
. Display and discuss your findings. - Use the
ivreg()
function fromivreg
to carry out an IV regression off_rw
onf_prot
usingkmwittenberg
as an instrument forf_prot
and store your results in an object callediv
. - Display your results from
iv
. What is the difference in literacy rate between an all-Protestant and an all-Catholic county? How does this compare to the results ofols
? - Verify that you get the same estimate as in part (d) by running IV “by hand” using
first_stage
andreduced_form
. Hint: Use thecoef()
function and think about how the reduced stage is obtained by substituting the first into the second stage, and combine the coefficients appropriately.
- Estimate the first-stage regression of
- Fully specified model: Now add demographic controls to replicate Column (2) of Table II and Columns (1) and (2) in Table III.
- Repeat Question 2 including
f_young
,f_jew
,f_fem
,f_ortsgeb
,f_pruss
,hhsize
,lnpop
,gpop
,f_miss
,f_blind
,f_deaf
andf_dumb
as additional regressors. - Repeat Question 3 (a) and (c) including the additional regressors. Hint: Treat them as exogenous for (c). This means we will not need an instrument for these variable: instead they serve as their own instrument. See “Details” in the help file for
ivreg()
to see how to specify this. - Display your results in a cleanly formatted regression table using
modelsummary()
. Do they match Column (2) of Table II and Columns (1) and (2) in Table III? Does your interpretation change for the fully specified model? - Compare the OLS and IV results in the fully specified model. Does this confirm the hypothesis of negative selection into protestantism?
- Repeat Question 2 including
Footnotes
The 1871 census classifies disabled people into “blind”, “deaf-mute”, and “stupids and lunatics”. The variable names in BW reflect this classification. I’ve used contemporary language for the variable descriptions.↩︎
BW mention that results are not qualitatively different for a logit-transformed dependent variable and clustered standard errors.↩︎