erm

Core Empirical Research Methods

It’s nearly impossible to overstate the value that economists ascribe to cleverness. Like most obsessions, this one is not altogether healthy.

– David Autor

My general philosophy in life is never to rely on being clever; instead I want to rely on being thorough and having a justifiable workflow.

– Richard McElreath

A theoretical statistician knows all about measure theory but has never seen a measurement whereas the actual use of measure theory by the applied statistician is a set of measure zero.

– Stephen Senn

You don’t need to learn how to code. You just need to be able to tell a computer what to do in a way that it will respond, understand what it’s doing and how to optimize that, and fix it when it’s not working.

– Austen Allred

Overview

This is the course website for Core Empirical Research Methods (core ERM), a 1st-year MPhil course in the Economics Department at the University of Oxford. Core ERM will help you develop the basic skills you’ll need to carry out applied economic research. It will cover a mix of applied econometrics, programming/computing, and research skills. The prerequisites are basic familiarity with programming in some language, not necessarily R, and an introductory course in econometrics at the masters level. If you are interested in auditing this course see Auditing Core ERM below.

Because Core ERM is about doing applied economics, it will not be a traditional lecture course. Students should bring their laptops to lectures so that they can follow along with live demos and work on examples in small groups. While there will still be some lecture-style material, the overall format will be closer to a “lab” in the natural sciences. As such attendance is mandatory if you are taking this course for credit. Please see Attendance below for more details. GTAs (Graduate Teaching Assistants) will attend each lecture to help give you individualized help if you get stuck while working through in-class exercises. See Required Software for details on how to configure your machine for core ERM.

Personnel

Lecturer: Francis J. DiTraglia

Teaching Assistants (GTAs)

Discussion Board

We will not use canvas for core ERM. Instead, all course materials will be posted on the course website and all other communication will take place on ed. Please register for the discussion board by following this link. I have enabled self sign-up for all email addresses that end in @ox.ac.uk or @*.ox.ac.uk so either your college or departmental email address should work. Please do not send email messages to your GTAs or the course instructor; we ask that you use the discussion board instead. If you have a post about course content, we kindly request that you post it publicly–you are free to remain anonymous when posting publicly–so that our answer can benefit the other students in the course. Your classmates may also know the answer and be able to help you faster than we can, so there’s both a private and public benefit to this approach. For personal issues or questions specific to your mini-project please can send us a private message on the discussion board. Keeping all course communication in one place will allow us to spend more time helping you learn and less time on course admin.

Times and Locations

All class meetings will take place in the Manor Road Building (MRB)

Lectures

Weeks 1-8 of Trinity Term, MRB Lecture Theatre. Lecture attendance is required if you are taking this course for credit. (See Attendance for details.)

Wednesdays 11:00-12:30pm
Thursdays 11:30-1pm
Fridays 11:30-1pm

Drop-in Surgeries

Weeks 2-9 of Trinity Term in MRB Seminar Room D. Attendance is optional but strongly recommended. These sessions are particularly valuable for troubleshooting code problems for problem sets, getting feedback on your mini-project, and deepening your understanding of challenging concepts.

Mondays 2-5pm
Tuesdays 2-5pm

Office Hours

You can drop in to speak with me in room 2132 of the Manor Road building during the half hour before each of our lectures, in other words:

Wednesdays from 10:30-11am
Thursdays from 11-11:30am
Fridays from 11-11:30am

Office hours will commence on Thursday May 1st, since presumably there’s nothing much for us to discuss before the course has actually started :)

Required Software

In this course we will use the R programming language via a front-end called RStudio. Both are freely available on all major platforms. To install them follow these instructions. To smooth out the inevitable start-of-term kinks, during weeks 1 and 2 we will work with RStudio via Posit Cloud. Please sign up for a free account here. This will allow you to get right to work at the start of term even if you encounter problems installing R. Eventually you will need to get R and RStudio working on your own machine, however. The week 3 drop-in surgery is an excellent place to get help with installation issues.

Attendance

Because core ERM is an interactive, lab-based course, lecture attendance is mandatory. It is also in your best interest. A major part of your assessment is based on problem set. We will work through many of these together during lectures, but recordings will not be made available during the term. Because the material in core ERM is highly cumulative–each week builds on the last–regular attendance is the easiest and most reliable way to ensure that you gain the skills you will need to pass the course.

Moreover, while I would prefer to rely on the carrot rather than the stick, I will keep track of attendance at lectures in TT 2025. Students who miss more than five lectures without prior authorization will be contacted by the director of graduate studies and the senior tutor of their college. If you are in the UK on a student visa, it is particularly important that you attend regularly, as the government requires me to certify that you have been actively engaged with your course of study during the term. While it would never be my goal to try to get anyone into trouble, I am legally and ethically bound to report your attendance accurately when it is formally requested of me.

Assessment

This course is pass/fail and will be assessed entirely on the basis of coursework assignments. Before we go any further: yes it is possible to fail core ERM. See Re-sits for more details. All assignments must be submitted via Inspera. See Inspera Submission Requirements for more details on how to submit. Your coursework assignments come in two parts, each of which will be assessed using the same marking criteria as detailed below. To pass the course, you must pass both parts of the assessment. The two parts are as follows:

Part A: Problem Sets

Part A will consist of four problem sets due in TT weeks 2, 4, 6, and 8:

TT Week 2: Submit Problem Set 1 by noon on Friday
TT Week 4: Submit Problem Set 2 by noon on Friday
TT Week 6: Submit Problem Set 3 by noon on Friday
TT Week 8: Submit Problem Set 4 by noon on Friday You must pass all four of the problem set submissions to pass this component of the course. See the marking criteria for more details.

Part B: A Mini-project of Your Choice

Part B will consist of “mini-project” of your choice that you will complete between weeks 3 and 9 of term. Your mini-project will be due at noon on Wednesday of TT Week 9. For full details, see the Mini-Project FAQs below. Because you choose the mini-project, you can work on something that is intrinsically interesting to you. Ideally the topic will be relevant to your MPhil thesis: you can kill two birds with one stone. And because you will complete your mini project during the term, you will have the opportunity to get help and feedback from me and your GTAs at lectures and the weekly drop-in surgeries.

Academic Integrity

Consulting Human Beings

You are allowed, and indeed encouraged, to discuss course problems and assignments with your classmates and GTAs, but you are not allowed to directly copy code or results from another student. The work that you submit for assessment must be your own, even if it incorporates suggestions from your classmates and GTAs.

Consulting AI

Problem Sets

There are some restrictions on how you are allowed to use large language models (LLMs) in your problem set submissions. In short: you can consult them in the same way that you are free to consult your classmates and GTAs, e.g. as a tool to help you learn R, help debug code, and so on. But you are not allowed to paste in problem set questions and ask for solutions. For example, asking “Can you explain how to filter rows in dplyr?” is acceptable, while asking “How would I solve question 3 from problem set 2?” is not permitted. For the same reason, you are not permitted to use tools that autocomplete code as you type–e.g. GitHub Copilot–when completing problem sets. Generative AI can most likely generate correct solutions to all of my problem set problems, so you may find yourself sorely tempted. There are two reasons why you should not succumb. First, perfectly correct solutions generated by ChatGPT and Claude look sufficiently dissimilar to the examples that I provide in my course materials that it is extremely easy for me to tell that they were AI-generated. Second, if you rely solely on AI, you will never learn to code. And if you never learn to code, you will put yourself out of a job. AI tools substitute for humans with low coding ability; they complement humans with high coding ability.

Mini-Projects

I insist that you learn to code, but I also insist that you learn to use AI. For this reason, we will help you set up Github Copilot and teach you how to use it. On your mini-project you are free to use generate AI however you see fit: there are no restrictions whatsoever. But please bear in mind that any code you submit must adhere to my Marking Criteria.

Course Material

Most of the course material for core ERM is delivered through a series of online videos with associated exercises and solutions. You are expected to watch these videos at home before our class meetings, work through the short exercises, and check your work against my solutions. This will allow us to use class time to do more interesting and exciting things, including working together on problem set questions and mini-projects.

Week 1
1. Crash Course in R Programming: slides, solutions
  - Videos: 1, 2, 3, 4, 5, 6
  - During Class: welcome / Q&A, survey, two truths, coins
2. Getting Started with dplyr: slides, solutions
  - Video
  - During Class: quiz, Q&A, Lakisha
3. Getting Started with ggplot2 slides, solutions
  - Video
  - During Class: quiz, Q&A, quarto / RMarkdown, FREDR
Week 2
1. Research Plumbing I: slides, solutions
  - Videos: 1, 2, 3, 4, 5
  - During Class: quiz, Q&A, Start Mini-Projects
2. Linear Regression: slides, solutions
  - Videos: 1, 2, 3
  - During Class: quiz, Q&A, Football
3. Monte Carlo Simulation Basics: slides, solutions
  - Videos: 1, 2, 3, 4
  - During Class: quiz, Q&A, Chevalier de Méré
Week 3
1. Logistic Regression: slides, solutions
  - Video
  - During Class: quiz, two truths with new data, Wells
2. Selection-on-observables: slides
  - Video
  - During Class: quiz, how to adjust, NSW
3. DAGs and Bad Controls: slides
  - Video
  - During Class: quiz, DAGs exercise
Week 4
1. The Multivariate Normal Distribution: slides, solutions
  - Video
  - During Class: midline survey, Mini-projects
2. Instrumental Variables: slides, solutions
  - Video
  - During Class: quiz, Weber
3. Local Average Treatment Effects: slides
  - Video
  - During Class: quiz, TBA
Week 5
1. Running a Simulation Study: slides, solutions
  - Video
  - During Class: Weak IV (Johanna)
2. Regression Discontinuity: slides
  - Video
  - During Class: quiz, TBA
3. Research Plumbing II: slides, solutions
  - Video
Week 6:
1. Statistical Inference - Defense Against the Dark Arts: slides, solutions
  - Video
  - During Class: quiz, TBA
2. Heteroskedasticity and Clustering: slides, solutions
  - Video
  - During Class: Robust Standard Errors (Johanna)
3. Panel Data Basics: slides, solutions
  - Video
  - During Class: TBA
Week 7:
1. Difference-in-Differences: slides
  - Video
  - During Class: TBA
2. Version Control & Collaboration with Git (Rafael Suchy)
3. LaTeX & Overleaf (Frank) / Coding with Copilot (Elodie Chervin)
Week 8:
1. Key Resources for your MPhil Thesis (John Southall) / endline survey
2. Reference Management and Workflows for Literature Reviews (TBC)
3. AI for Coding, Literature Review and Brainstorming (Dominik Lukes)

Problem Sets

Problem set questions will be posted here during the term. Please consult the marking criteria and academic integrity policy for more information.

Problem Set 1: Due Friday of TT Week 2 at noon
- Collatz, Lakisha, and FREDR
Problem Set 2: Due Friday of TT Week 4 at noon
- Football, Monte Carlo, Wells, NSW
Problem Set 3: Due Friday of TT Week 6 at noon
- Weber, Weak IV, MLDA, Mini-project Milestone
Problem Set 4: Due Friday of TT Week 8 at noon
- Behrens-Fisher, Airfare, Card & Krueger

Draft Book

When I first taught this course back in 2022, I started writing a book to accompany it. This turned out to be a tall order, but I did manage to produce ten draft chapters. You can view them at https://empirical-methods.com. Based on my experiences teaching version 1.0 of core ERM, I decided to make a number of changes to the course. While much of the material in my draft book remains relevant, my lecture slides will be the final authority on the course material in the present version of core ERM. I hope to rework the book before next year’s version of core ERM.

Re-sits

Barring a serious personal issue that affects your studies, there is no reason why you should fail core ERM. If you attend class, participate actively, and get help at the drop-in surgeries as needed, you will develop all of the skills needed to complete the course assignments to the appropriate standard. If for some reason you do fail core ERM, you will be given the opportunity to re-sit any failed assignments the next time that core ERM is offered, i.e. in Trinity term of next year. (Remember: you need to pass all four problem sets and your mini-project to pass the course.) Clearly this is something you will want to avoid, so take my advice and do what’s necessary to pass the first time around.

Mini-project FAQs

The course mini project is a small independent project of your own choosing, designed to require roughly the same time commitment as two problem sets. You will complete your project between Weeks 2 and 9 of term and submit it as part of your course assessment.

Your project should be a replication (or partial replication) of a reputable paper in economics or a closely related field. Specifically, you will:

Obtain the original data used in the paper.
Write R code to clean the data.
Reproduce a few key tables and figures from the paper — especially those containing summary statistics and main results.

Expected Scope and Length

You are free to choose which parts of the paper to replicate. However, to ensure comparable workload across projects, each submission must include:

At least three tables and/or figures, including:
- A table of summary statistics.
- The results of the primary analysis.
- A robustness check or heterogeneity analysis.

Even if the original paper does not include these elements, your replication must.

We strongly encourage you to choose papers that use microdata (e.g., individual-level survey data like the UK Labour Force Survey). However, if you select a paper using macro data (e.g., national unemployment rates) – where data cleaning is typically simpler – we expect you to replicate at least four tables and/or figures.

On top of showing that you can run the same analysis as the authors, a high-quality replication must also engage critically with the paper and its findings. We ask that your final submission include a 2-4 page-long introduction that covers the motivation, contribution and key findings of the paper. Whenever you are replicating an analysis, you should provide the estimating equation, and comment on the methodology. You should discuss what the estimation actually does (i.e. what are we estimating?), and how it does it (i.e. why is this the right specification / method, how is the parameter of interest identified). Finally, you should comment on your results, evaluate whether the paper is indeed replicable, and suggest specific areas where improvements could be made to the paper.

Choosing a Paper

When selecting a paper, you must adhere to the following five rules:

Each student must choose a different paper.
There must not already be R replication code available online for your chosen paper (code in other languages, such as Stata, is fine).
The necessary data must be available online and free of special access restrictions. You may use data that is not available directly in the replication files, but it must be publicly available and reasonably accessible (e.g. UK labour force data is fine, but Swedish administrative data is not).
The paper must have been published within the last 10 years in a high-quality economics journal. Recommended journals include:
- American Economic Journal (all series)
- American Economic Review
- Quarterly Journal of Economics
- Econometrica
- Journal of Political Economy
- Review of Economic Studies
- Review of Economics and Statistics
- Economic Journal
- Journal of Health Economics
- Journal of Labor Economics
- Journal of Human Resources
- Journal of Development Economics
- Journal of the European Economic Association
- Journal of Public Economics
- Journal of Financial Economics
- Journal of Finance
You must receive approval from either me or one of the GTAs before beginning work.

Subject to these rules, you are free to choose any paper you like. We strongly suggest browsing the resources listed below to find suitable options. Kindly note that we will not accept working papers.

Resources for Finding a Paper

If you’re unsure where to start, two excellent resources are:

You can also attend GTA drop-in sessions to receive suggestions.

Expected Outputs

You are required to produce the following outputs:

Completion of Paper Sign-Up Sheet
- Deadline: Friday, Week 3 at noon
- You must submit your selected paper for approval on this sign-up sheet.
Initial Report
- Deadline: Friday, Week 6 at noon (As part of your Problem Set #3 Submission)
- A 1–2 page description of your project, including:
  - A brief summary of the paper (methods, identification, key findings)
  - A description of the original replication files for the paper (organization, data availability)
  - A draft of the summary statistics table that you will include in the final submission
Final Submission
- Deadline: Wednesday Week 9 at noon
- The full replication report and code files.

Marking Criteria

Assignments in Core ERM will be graded pass/fail based on five criteria. Criteria 1–3 are all-or-nothing and necessary to pass a given assignment. Criteria 4 and 5 allow for partial marks.

Clean Code: Your R code must adhere to the tidyverse style guide. It should be clean, easy to read, and appropriately commented.
Correct Code: Your R code must be syntactically correct, i.e. it must run without errors. This will be assessed based on your ability to successfully knit an RMarkdown/Quarto file with your results: your file will not knit unless the code is correct. More details on RMarkdown/Quarto will be provided in lectures.
Formatting & Typesetting: For a given assignment, you will submit a single pdf document constructed from one or more underlying RMarkdown/Quarto reports incorporating your code and detailing your solutions to the questions on the assessment. Your write-ups should be clearly formatted using appropriate markdown commands. Any mathematical formulas that you incorporate should be clearly and cleanly typeset using appropriate LaTeX commands.
Completeness: To pass a given question on a course assignment, your answer must at a minimum be substantially complete. This means addressing all parts of the question and providing all requested deliverables (e.g., graphs, tables, numerical calculations). Partial solutions only receive partial marks, regardless of quality.
Quality: To pass a given question on the assignment, your answer must be substantially correct. This means providing accurate calculations, appropriate interpretations, and clear explanations of your methodology and findings. Poorly explained or substantially incorrect answers will only receive partial marks. If I can’t tell that you understand what you’re doing, you will not pass a given question.

Inspera Submission Requirements

Submit a single PDF document to Inspera for each coursework submission. This document should be generated using Rmarkdown / quarto following the instructions from class. A simple and more-or-less foolproof approach is to create a separate quarto / Rmarkdown file for each problem, knit each one to html, print to pdf, and merge into a single file using PDFsam Basic. If you know what you are doing, you are welcome to use an alternative approach e.g. creating a single qmd / Rmarkdown file and knitting directly to pdf using tinytex.
This course is marked anonymously so please do not write your name on your assignment. Instead, please provide your candidate number
It is your responsibility to familiarize yourself with Inspera before the submission deadline. You can practice submitting using the practice site in you Inspera dashboard. For more information, see this video and user guide
You MUST ensure that you upload your file and, when you are happy you have uploaded the correct and final file(s), submit them. This is a two-step process. Your work is NOT submitted until you have pressed the Submit now button.
Once you press the “Submit now” button, you will be shown a confirmation that your work has been submitted. You can also view the work you have submitted by going to the Dashboard in Inspera and clicking on “Archive”
You won’t be able to edit your submission after you press “Submit now” so make sure that you check your work VERY CAREFULLY before you submit it.
If you do identify a problem with the file you have submitted, you can replace the file before the deadline, or within 30 minutes of the deadline by emailing econgrad@economics.ox.ac.uk. The academic office is extremely helpful, but very busy. So please please reserve this option for serious problems, not minor errors. Remember: this is a pass/fail course.

Auditing Core ERM

Between 80 and 90 students take Core ERM for credit each year, but the MRB lecture theatre seats 120. Provided that there’s space left in the room, any member of the university is most welcome to attend my lectures without asking for permission in advance. I ask only that you respect the following guidelines. First, please sit in the back row if you’re auditing so that I can more easily gauge attendance etc. Second, the Drop-In Surgeries are only for students who are taking the course for credit. Third, Core ERM lectures are fairly interactive: I and the GTAs will circulate to help students who encounter difficulties while working on the exercises. I won’t go so far as to say that we won’t help you if you’re auditing, but we will need to prioritize the students who are taking the course for credit.