erm

Core Empirical Research Methods

It’s nearly impossible to overstate the value that economists ascribe to cleverness. Like most obsessions, this one is not altogether healthy.

David Autor

My general philosophy in life is never to rely on being clever; instead I want to rely on being thorough and having a justifiable workflow.

Richard McElreath

A theoretical statistician knows all about measure theory but has never seen a measurement whereas the actual use of measure theory by the applied statistician is a set of measure zero.

Stephen Senn

You don’t need to learn how to code. You just need to be able to tell a computer what to do in a way that it will respond, understand what it’s doing and how to optimize that, and fix it when it’s not working.

Austen Allred

Overview

This is the course website for Core Empirical Research Methods (Core ERM), a 1st-year MPhil course in the Economics Department at the University of Oxford. Core ERM will help you develop the basic skills you’ll need to carry out applied economic research. It will cover a mix of applied econometrics, programming/computing, and research skills. The prerequisites are basic familiarity with programming in some language, not necessarily R, and an introductory course in econometrics at the masters level.

Learning goals

Core ERM will teach you the following skill set:

Learning goal Skill set
Applied Econometrics You will learn how to implement causal inference methods and run computer simulations.
Data analysis You will learn how to clean, visualise, and analyse data sets.
R Programming You will learn core programming concepts, and how to write clean and efficient code.
Reproducible Research Workflows You will learn how to document your workflows transparently, make your work replicable, and collaborate with colleagues using open-source tools.
Empirical Research Tools You will learn how to use RMarkdown, APIs, GitHub, LaTeX, Zotero, and AI tools for empirical research.

Course format

Because Core ERM is about doing applied economics, it will not be a traditional lecture course. The course is flipped-classroom: the lecture content is pre-recorded in bite-sized chunks that you should watch in your own time before the lectures. In the lectures, we will consolidate and build on what you learned from the online recordings. Recordings are for introducing you to new concepts, class time is for taking your new skills to real empirical problems. We will use lectures to solve problem sets together, learn how to code collaboratively, and have interactive discussions. The overall format will be closer to a “lab” in the natural sciences, with mandatory attendance.

GTAs (Graduate Teaching Assistants) will attend each lecture give you individualized help. You should bring your laptops so that you can follow along with live demos and work on examples in small groups.

Workload

You should expect this course to take you about 20 hours per week.

Personnel

Role Name Description
Lecturer Johanna Barop Johanna runs the course, delivers lectures, and offers office hours.
Convenor Kevin Sheppard Kevin convenes the course and is responsible for grading your assessments.
Teaching Assistants (GTAs) Inbar Amit and Jonas Kurle Inbar and Jonas answer your questions during lectures and run the surgeries. Inbar organises the mini projects

Discussion Board

We will not use Canvas for Core ERM. Instead, all course materials will be posted on the course website and all other communication will take place on ed. Please register for the discussion board by following this link. I have enabled self sign-up for all email addresses that end in @ox.ac.uk or @*.ox.ac.uk so either your college or departmental email address should work. Please do not send email messages to your GTAs or the course instructor; we ask that you use the discussion board instead. If you have a post about course content, we kindly request that you post it publicly–you are free to remain anonymous when posting publicly–so that our answer can benefit the other students in the course. Your classmates may also know the answer and be able to help you faster than we can, so there’s both a private and public benefit to this approach. For personal issues or questions specific to your Mini Project please can send us a private message on the discussion board. Keeping all course communication in one place will allow us to spend more time helping you learn and less time on course admin.

Times and Locations

All class meetings will take place in the Manor Road Building (MRB)

Lectures

Weeks 1-8 of Trinity Term, MRB Lecture Theatre. Lecture attendance is required if you are taking this course for credit. (See Attendance for details.)

Drop-in Surgeries

Weeks 2-9 of Trinity Term, MRB Seminar Room G. Attendance is optional but strongly recommended. These sessions are particularly valuable for troubleshooting code problems for problem sets, getting feedback on your Mini Project, and deepening your understanding of challenging concepts.

Office Hours

Weeks 1-8 of Trinity Term, Room 2119. You can drop in to speak with me during the half hour before each of our lectures:

Required Software

In this course we will use the R programming language via a front-end called RStudio. Both are freely available on all major platforms. To install them follow these instructions. To smooth out the inevitable start-of-term kinks, during weeks 1 and 2 we will work with RStudio via Posit Cloud. Please sign up for a free account here. This will allow you to get right to work at the start of term even if you encounter problems installing R. Eventually you will need to get R and RStudio working on your own machine, however. The week 3 drop-in surgery is an excellent place to get help with installation issues.

Attendance

Because core ERM is an interactive, lab-based course, lecture attendance is mandatory. It is also in your best interest. A major part of your assessment is based on problem sets. We will solve many of these together during lectures. Because the material in each week builds on the last, regular attendance is the easiest and most reliable way to ensure that you gain the skills you will need to pass the course. We will keep track of attendance and only publish lecture recordings out of term to keep classes interactive.

Assessment

This course is pass/fail and will be assessed entirely on the basis of coursework assignments. All assignments must be submitted via Inspera. See Inspera Submission Requirements for more details on how to submit. Your coursework assignments come in two parts, each of which will be assessed using the same marking criteria as detailed below.

If you fail Core ERM, it is possible to re-sit the course the next academic year. See Re-sits for more details.

To pass the course, you must pass both parts of the assessment. The two parts are as follows:

Part A: Problem Sets

Part A will consist of four problem sets due in TT weeks 2, 4, 6, and 8:

You must pass all four of the problem set submissions to pass this component of the course. See the marking criteria for more details.

Part B: A Mini-project of Your Choice

Part B will consist of “Mini Project” of your choice that you will complete between weeks 3 and 9 of term. Your Mini Project will be due at noon on Wednesday of TT Week 9. For full details, see the Mini Project FAQs below.

Because you choose the Mini Project, you can work on something that is intrinsically interesting to you. Ideally the topic will be relevant to your MPhil thesis: you can kill two birds with one stone. And because you will complete your mini project during the term, you will have the opportunity to get help and feedback from me and your GTAs at lectures and the weekly Drop-in surgeries.

Course Material

Most of the course material for Core ERM is delivered through a series of online videos with associated exercises and solutions. You should watch these videos before each lecture and work through the associated excercises. Each video is directly connected to your assessment - sometimes you can even use code chunks from the lecture slides!

Itinerary

Week Lecture Content Slides Videos During class
W1 Lecture 1 Crash Course in R Programming slides, solutions 1, 2, 3, 4, 5, 6 Welcome, Baseline Survey, Coins
W1 Lecture 2 Getting Started with dplyr slides, solutions Video Lakisha
W1 Lecture 3 Getting Started with ggplot2 slides, solutions Video Quarto/RMarkdown, FREDR
W2 Lecture 4 Research Plumbing I Start Mini Projects
W2 Lecture 5 Linear Regression
W2 Lecture 6 Monte Carlo Simulation Basics  
W3 Lecture 7 Logistic Regression
W3 Lecture 8 Selection-on-observables
W3 Lecture 9 DAGs and Bad Controls
W4 Lecture 10 The Multivariate Normal Distribution
W4 Lecture 11 Instrumental Variables
W4 Lecture 12 Local Average Treatment Effects  
W5 Lecture 13 Running a Simulation Study
W5 Lecture 14 Regression Discontinuity  
W5 Lecture 15 Research Plumbing II  
W6 Lecture 16 Statistical Inference - Defense Against the Dark Arts  
W6 Lecture 17 Heteroskedasticity and Clustering
W6 Lecture 18 Panel Data Basics  
W7 Lecture 19 Difference-in-Differences  
W7 Lecture 20 Version Control & Collaboration with GitHub      
W7 Lecture 21 AI for Coding, Literature Review and Brainstorming (Dominik Lukes)      
W8 Lecture 22 LaTeX & Overleaf      
W8 Lecture 23 Reference Management with Zotero and Workflows for Literature Reviews      
W8 Lecture 24 Key Resources for your MPhil Thesis (John Southall)    

Problem Sets

Problem set questions will be posted here during the term. Please consult the marking criteria and academic integrity policy for more information.

Mini Project

The course mini project is a small independent project of your own choosing, designed to require roughly the same time commitment as two problem sets. You will complete your project between Weeks 2 and 9 of term and submit it as part of your course assessment.

Your project should be a replication (or partial replication) of a reputable paper in economics or a closely related field. Specifically, you will:

Expected Scope and Length

You are free to choose which parts of the paper to replicate. However, to ensure comparable workload across projects, each submission must include:

Even if the original paper does not include these elements, your replication must.

We strongly encourage you to choose papers that use microdata (e.g., individual-level survey data like the UK Labour Force Survey). However, if you select a paper using macro data (e.g., national unemployment rates) – where data cleaning is typically simpler – we expect you to replicate at least three tables and/or figures.

On top of showing that you can run the same analysis as the authors, a high-quality replication must also engage critically with the paper and its findings. We suggest the following structure for your final replication report:

Choosing a Paper

When selecting a paper, you must adhere to the following five rules:

  1. Each student must choose a different paper.

  2. There must not already be R replication code available online for your chosen paper (code in other languages, such as Stata, is fine).

  3. The necessary data must be available online and free of special access restrictions. You may use data that is not available directly in the replication files, but it must be publicly available and reasonably accessible (e.g. UK labour force data is fine, but Swedish administrative data is not).

  4. The paper must have been published within the last 10 years in a high-quality economics journal. Recommended journals include:

    • American Economic Journal (all series)
    • American Economic Review
    • Quarterly Journal of Economics
    • Econometrica
    • Journal of Political Economy
    • Review of Economic Studies
    • Review of Economics and Statistics
    • Economic Journal
    • Journal of Health Economics
    • Journal of Labor Economics
    • Journal of Human Resources
    • Journal of Development Economics
    • Journal of the European Economic Association
    • Journal of Public Economics
    • Journal of Financial Economics
    • Journal of Finance
  5. You must receive approval from either me or one of the GTAs before beginning work.

Subject to these rules, you are free to choose any paper you like. We strongly suggest browsing the resources listed below to find suitable options. Kindly note that we will not accept working papers.

Resources for Finding a Paper

If you’re unsure where to start, two excellent resources are:

You can also attend GTA drop-in sessions to receive suggestions.

Expected Outputs

You are required to produce the following outputs:

  1. Completion of Paper Sign-Up Sheet

    • Deadline: Friday, Week 3 at noon
    • You must submit your selected paper for approval on this sign-up sheet.
  2. Initial Report

    • Deadline: Friday, Week 6 at noon (As part of your Problem Set #3 Submission)
    • A 1–2 page description of your project, including:
      • A brief summary of the paper (methods, identification, key findings)
      • A description of the original replication files for the paper (organization, data availability)
      • A draft of the summary statistics table that you will include in the final submission
  3. Final Submission

    • Deadline: Wednesday Week 9 at noon
    • The full replication report and code files.

Assessment Marking

Core ERM is a pass/fail course. To pass overall, you need to achieve a weighted average mark of 60 across your assessments:

\[\text{Overall mark} = \frac{2}{3} \times \frac{\text{PS1} + \text{PS2} + \text{PS3} + \text{PS4}}{4} + \frac{1}{3} \times \text{Mini Project}\]

Marking Criteria

Assignments in Core ERM are marked based on five criteria:

  1. Clean Code: Your R code must adhere to the tidyverse style guide. It should be clean, easy to read, and appropriately commented.
  2. Correct Code: Your R code must be syntactically correct, i.e. it must run without errors. This will be assessed based on your ability to successfully knit an RMarkdown/Quarto file with your results: your file will not knit unless the code is correct. More details on RMarkdown/Quarto will be provided in lectures.
  3. Formatting & Typesetting: For a given assignment, you will submit a single pdf document constructed from one or more underlying RMarkdown/Quarto reports incorporating your code and detailing your solutions to the questions on the assessment. Your write-ups should be clearly formatted using appropriate markdown commands. Any mathematical formulas that you incorporate should be clearly and cleanly typeset using appropriate LaTeX commands.
  4. Completeness: To pass a given question on a course assignment, your answer must at a minimum be substantially complete. This means addressing all parts of the question and providing all requested deliverables (e.g., graphs, tables, numerical calculations). Partial solutions only receive partial marks, regardless of quality.
  5. Quality: To pass a given question on the assignment, your answer must be substantially correct. This means providing accurate calculations, appropriate interpretations, and clear explanations of your methodology and findings. Poorly explained or substantially incorrect answers will only receive partial marks. If I can’t tell that you understand what you’re doing, you will not pass a given question.

Grading Rubric

Each assignment will be graded according to this rubric:

Mark Level Grading Rubric
90 Exceptional Pass The assignment is perfect. All parts are present, correct, and fully satisfy the marking criteria.
70 High Pass All parts of the assignment are present, mostly correct, and fully satisfy the marking criteria.
60 Pass All parts of the assignment are present, but there are some noticeable errors and/or deviations from marking criteria 1–3 above.
49 Marginal Fail The assignment is mostly complete but may be missing some components and/or there are more serious errors and deviations from marking criteria 1–3 above.
40 Fail The assignment is mostly incomplete and/or fundamentally incorrect and/or completely ignores marking criteria 1–3 above.
0 No submission  

We will deduct marks for late submissions following the MPhil Economics Exam Conventions:

Lateness Cumulative Penalty
After the deadline but on the same day 5 marks
Each additional day (e.g. the day after the deadline = 6 marks, the day after that = 7 marks; note that Saturday and Sunday count the same as weekdays) 1 mark
Maximum deducted mark (up to 14 calendar days late) 18 marks
More than 14 calendar days late Fail

This ensures that you won’t fail an assignment if you submit late.

Academic Integrity

Consulting Human Beings

You are allowed, and indeed encouraged, to discuss course problems and assignments with your classmates and GTAs, but you are not allowed to directly copy code or results from another student. The work that you submit for assessment must be your own, even if it incorporates suggestions from your classmates and GTAs.

Consulting AI

Problem Sets

There are some restrictions on how you are allowed to use large language models (LLMs) in your problem set submissions. In short: you can consult them in the same way that you are free to consult your classmates and GTAs, e.g. as a tool to help you learn R, help debug code, and so on. But you are not allowed to paste in problem set questions and ask for solutions. For example, asking “Can you explain how to filter rows in dplyr?” is acceptable, while asking “How would I solve question 3 from problem set 2?” is not permitted. For the same reason, you are not permitted to use tools that autocomplete code as you type–e.g. GitHub Copilot–when completing problem sets. Generative AI can most likely generate correct solutions to all of my problem set problems, so you may find yourself sorely tempted. There are two reasons why you should not succumb. First, perfectly correct solutions generated by ChatGPT and Claude look sufficiently dissimilar to the examples that I provide in my course materials that it is extremely easy for me to tell that they were AI-generated. Second, if you rely solely on AI, you will never learn to code.

Mini-Projects

I insist that you learn to code, but I also insist that you learn to use AI. For this reason, we will help you set up Github Copilot and teach you how to use it. On your mini-project you are free to use generate AI however you see fit: there are no restrictions whatsoever. But please bear in mind that any code you submit must adhere to my Marking Criteria.

Inspera Submission Requirements

Re-sits

Barring a serious personal issue that affects your studies, there is no reason why you should fail Core ERM. If you attend class, participate actively, and get help at the drop-in surgeries as needed, you will develop all of the skills needed to complete the course assignments to the appropriate standard. If for some reason you do fail Core ERM, you will be given the opportunity to re-sit any failed assignments the next time that Core ERM is offered, i.e. in Trinity term of next year. (Remember: you need to pass all four problem sets and your mini-project to pass the course.)

Draft Book

When Frank first taught this course back in 2022, he started writing a book to accompany it. You can view his ten draft chapters at https://empirical-methods.com. The lecture slides will be the final authority on the course material in the present version of Core ERM.

Auditing Core ERM

Between 80 and 90 students take Core ERM for credit each year, but the MRB lecture theatre seats 120. Provided that there’s space left in the room, any member of the university is most welcome to attend my lectures without asking for permission in advance. I ask only that you respect the following guidelines. First, please sit in the back row if you’re auditing so that I can more easily gauge attendance etc. Second, the Drop-In Surgeries are only for students who are taking the course for credit. Third, Core ERM lectures are fairly interactive: I and the GTAs will circulate to help students who encounter difficulties while working on the exercises. I won’t go so far as to say that we won’t help you if you’re auditing, but we will need to prioritize the students who are taking the course for credit.