erm

Core Empirical Research Methods

It’s nearly impossible to overstate the value that economists ascribe to cleverness. Like most obsessions, this one is not altogether healthy.

David Autor

My general philosophy in life is never to rely on being clever; instead I want to rely on being thorough and having a justifiable workflow.

Richard McElreath

A theoretical statistician knows all about measure theory but has never seen a measurement whereas the actual use of measure theory by the applied statistician is a set of measure zero.

Stephen Senn

Overview

This is the course website for Core Empirical Research Methods (Core ERM), a 1st-year MPhil course in the Economics Department at the University of Oxford. Core ERM will help you develop the basic skills you’ll need to carry out applied economic research. It will cover a mix of applied econometrics, programming/computing, and research skills. The prerequisites are basic familiarity with programming in some language, not necessarily R, and an introductory course in econometrics at the masters level.

Learning goals

Core ERM will teach you the following skill set:

Learning goal Skill set
Applied Econometrics You will learn how to implement causal inference methods and run computer simulations.
Data analysis You will learn how to clean, visualise, and analyse data sets.
R Programming You will learn core programming concepts, and how to write clean and efficient code.
Reproducible Research Workflows You will learn how to document your workflows transparently, make your work replicable, and collaborate with colleagues using open-source tools.
Empirical Research Tools You will learn how to use RMarkdown, APIs, GitHub, LaTeX, Zotero, and AI tools for empirical research.

Course format

Because Core ERM is about doing applied economics, it will not be a traditional lecture course. The course is flipped-classroom: the lecture content is pre-recorded in bite-sized chunks that you should watch in your own time before the lectures. In the lectures, we will consolidate and build on what you learned from the online recordings. Recordings are for introducing you to new concepts, class time is for taking your new skills to real empirical problems. We will use lectures to solve problem sets together, learn how to code collaboratively, and have interactive discussions. The overall format will be closer to a “lab” in the natural sciences, with mandatory attendance.

GTAs (Graduate Teaching Assistants) will attend each lecture give you individualized help. You should bring your laptops so that you can follow along with live demos and work on examples in small groups.

Workload

You should expect this course to take you about 20 hours per week.

Personnel

Role Name Description
Lecturer Johanna Barop Johanna runs the course, delivers lectures, and offers office hours.
Convenor Kevin Sheppard Kevin convenes the course and is responsible for grading your assessments.
Teaching Assistants (GTAs) Inbar Amit and Jonas Kurle Inbar and Jonas answer your questions during lectures and run the surgeries. Inbar organises the mini projects

Discussion Board

We will not use Canvas for Core ERM. Instead, all course materials will be posted on the course website and all other communication will take place on ed. Please register for the discussion board by following this link. I have enabled self sign-up for all email addresses that end in @ox.ac.uk or @*.ox.ac.uk so either your college or departmental email address should work. Please do not send email messages to your GTAs or the course instructor; we ask that you use the discussion board instead. If you have a post about course content, we kindly request that you post it publicly–you are free to remain anonymous when posting publicly–so that our answer can benefit the other students in the course. Your classmates may also know the answer and be able to help you faster than we can, so there’s both a private and public benefit to this approach. For personal issues or questions specific to your Mini Project please can send us a private message on the discussion board. Keeping all course communication in one place will allow us to spend more time helping you learn and less time on course admin.

Times and Locations

All class meetings will take place in the Manor Road Building (MRB)

Lectures

Weeks 1-8 of Trinity Term, MRB Lecture Theatre. Lecture attendance is required if you are taking this course for credit. (See Attendance for details.)

Drop-in Surgeries

Weeks 2-9 of Trinity Term in MRB. Attendance is optional but strongly recommended. These sessions are particularly valuable for troubleshooting code problems for problem sets, getting feedback on your Mini Project, and deepening your understanding of challenging concepts.

The Monday surgeries in weeks 6 and 7 are dedicated to your mini projects. Bring along a draft of your initial report and get individualised feedback from the GTAs!

Office Hours

Weeks 1-8 of Trinity Term, Room 2119. You can drop in to speak with me during the half hour before each of our lectures:

Required Software

In this course we will use the R programming language via a front-end called RStudio. Both are freely available on all major platforms. To install them follow these instructions. To smooth out the inevitable start-of-term kinks, during weeks 1 and 2 we will work with RStudio via Posit Cloud. Please sign up for a free account here. This will allow you to get right to work at the start of term even if you encounter problems installing R. Eventually you will need to get R and RStudio working on your own machine, however. The week 3 drop-in surgery is an excellent place to get help with installation issues.

Attendance

Because core ERM is an interactive, lab-based course, lecture attendance is mandatory. It is also in your best interest. A major part of your assessment is based on problem sets. We will solve many of these together during lectures. Because the material in each week builds on the last, regular attendance is the easiest and most reliable way to ensure that you gain the skills you will need to pass the course. We will keep track of attendance and only publish lecture recordings out of term to keep classes interactive.

Assessment

This course is pass/fail and will be assessed entirely on the basis of coursework assignments. All assignments must be submitted via Inspera. See Inspera Submission Requirements for more details on how to submit. Your coursework assignments come in two parts, each of which will be assessed using the same marking criteria as detailed below.

If you fail Core ERM, it is possible to re-sit the course over the summer. See Re-sits for more details.

To pass the course, you must pass both parts of the assessment. The two parts are as follows:

Part A: Problem Sets

Part A will consist of four problem sets due in TT weeks 2, 4, 6, and 8:

You must pass all four of the problem set submissions to pass this component of the course. See the marking criteria for more details.

Part B: A Mini-project of Your Choice

Part B will consist of “Mini Project” of your choice that you will complete between weeks 3 and 9 of term. Your Mini Project will be due at noon on Wednesday of TT Week 9. For full details, see the Mini Project FAQs below.

Because you choose the Mini Project, you can work on something that is intrinsically interesting to you. Ideally the topic will be relevant to your MPhil thesis: you can kill two birds with one stone. And because you will complete your mini project during the term, you will have the opportunity to get help and feedback from me and your GTAs at lectures and the weekly Drop-in surgeries.

Course Material

Most of the course material for Core ERM is delivered through a series of online videos with associated exercises and solutions. You should watch these videos before each lecture and work through the associated excercises. Each video is directly connected to your assessment - sometimes you can even use code chunks from the lecture slides!

Itinerary

Week Lecture Content Slides Videos During class
W1 Lecture 1 Crash Course in R Programming slides, solutions 1, 2, 3, 4, 5, 6 Welcome, Baseline Survey, Coins
W1 Lecture 2 Getting Started with dplyr slides, solutions Video Lakisha
W1 Lecture 3 Getting Started with ggplot2 slides, solutions Video FREDR
W2 Lecture 4 Research Plumbing I slides, solutions 1, 2, 3, 4, 5 Mini Projects, Quarto/RMarkdown and styler/ lintr, Inspera how-to
W2 Lecture 5 Linear Regression slides, solutions 1, 2, 3 Football
W2 Lecture 6 Monte Carlo Simulation Basics slides, solutions 1, 2, 3, 4 Monte Carlo
W3 Lecture 7 Logistic Regression
W3 Lecture 8 Selection-on-observables
W3 Lecture 9 DAGs and Bad Controls
W4 Lecture 10 The Multivariate Normal Distribution
W4 Lecture 11 Instrumental Variables
W4 Lecture 12 Local Average Treatment Effects  
W5 Lecture 13 Running a Simulation Study
W5 Lecture 14 Regression Discontinuity  
W5 Lecture 15 Research Plumbing II  
W6 Lecture 16 Statistical Inference - Defense Against the Dark Arts  
W6 Lecture 17 Heteroskedasticity and Clustering
W6 Lecture 18 Panel Data Basics  
W7 Lecture 19 AI for Coding (Dominik Lukes)      
W7 Lecture 20 Difference-in-Differences  
W7 Lecture 21 Version Control & Collaboration with GitHub      
W8 Lecture 22 LaTeX & Overleaf      
W8 Lecture 23 Reference Management with Zotero and Workflows for Literature Reviews      
W8 Lecture 24 Key Resources for your MPhil Thesis (John Southall)    

Problem Sets

Problem set questions will be posted here during the term. Please consult the marking criteria and academic integrity policy for more information.

Mini Project

The course mini project is a small independent project of your own choosing, designed to require roughly the same time commitment as two problem sets. You will complete your project between Weeks 2 and 9 of term and submit it as part of your course assessment.

Your project should be a replication (or partial replication) of a reputable paper in economics or a closely related field. Specifically, you will:

Expected Scope and Length

You are free to choose which parts of the paper to replicate. However, to ensure comparable workload across projects, each submission must include:

Even if the original paper does not include these elements, your replication must. Papers who propose two tables or figures that replicate the same analysis will be rejected.

We strongly encourage you to choose papers that use microdata (e.g., individual-level survey data like the UK Labour Force Survey). However, if you select a paper using macro data (e.g., national unemployment rates) – where data cleaning is typically simpler – we expect you to replicate at least three tables and/or figures.

On top of showing that you can run the same analysis as the authors, a high-quality replication must also engage critically with the paper and its findings. We suggest the following structure for your final replication report:

Note that your grades will depend on how well you are able to replicate what the paper says it does, the quality of your code, and how well you can critically engage with the findings of the replication. You are not graded on whether you get the exact same numbers as the original papers (though you should comment on why you think they differ) or whether your plots and figures are formatted in the same way.

Choosing a Paper

When selecting a paper, you must adhere to the following five rules:

  1. Each student must choose a different paper.

  2. There must not already be R replication code available online for your chosen paper (code in other languages, such as Stata, is fine).

  3. The necessary data must be available online and free of special access restrictions. You may use data that is not available directly in the replication files, but it must be publicly available and reasonably accessible (e.g. UK labour force data is fine, but Swedish administrative data is not).

  4. The paper must have been published within the last 10 years in a high-quality economics journal. Recommended journals include:

    • American Economic Journal (all series)
    • American Economic Review
    • Quarterly Journal of Economics
    • Econometrica
    • Journal of Political Economy
    • Review of Economic Studies
    • Review of Economics and Statistics
    • Economic Journal
    • Journal of Health Economics
    • Journal of Labor Economics
    • Journal of Human Resources
    • Journal of Development Economics
    • Journal of the European Economic Association
    • Journal of Public Economics
    • Journal of Financial Economics
    • Journal of Finance
  5. You must receive approval from either me or one of the GTAs before beginning work.

Subject to these rules, you are free to choose any paper you like. We strongly suggest browsing the resources listed below to find suitable options. Kindly note that we will not accept working papers.

Resources for Finding a Paper

If you’re unsure where to start, two excellent resources are:

You can also attend GTA drop-in sessions to receive suggestions.

Expected Outputs

You are required to produce the following outputs:

  1. Completion of Paper Sign-Up Sheet

    • Deadline: Friday, Week 3 at noon
    • You must submit your selected paper for approval on this sign-up sheet.
  2. Initial Report

    • Deadline: Friday, Week 6 at noon (As part of your Problem Set #3 Submission)
    • A 1–2 page description of your project, including:
      • A brief summary of the paper (methods, identification, key findings)
      • A description of the original replication files for the paper (organization, data availability)
      • A draft of the summary statistics table that you will include in the final submission
  3. Final Submission

    • Deadline: Wednesday Week 9 at noon
    • The full replication report and code files.

FAQs and Common Challenges

  1. What if I can’t find a suitable paper?

Don’t worry! Use the resources provided above, or come to GTA drop-in hours for advice. We will have dedicated mini-project drop in surgeries in weeks 6 and 7.

  1. What if the data for my paper is missing?

It is your responsibility to confirm early that the required data is available and complete. If you discover partway through the project that key data is missing, you may choose a new paper. Note however that incomplete replications will not be accepted. Reach out to the GTAs if you need help selecting a replacement paper.

  1. What if my results don’t match the paper’s?

First, carefully check your code to ensure it matches the intended analysis. You are welcome to ask the GTAs for help during Office Hours, but note:

Finally, remember that replication failures happen, even in published research. If you cannot replicate the paper’s results after careful work, discussing the discrepancy thoughtfully can make for an excellent project – just be sure to document your process clearly.

  1. What if the paper doesn’t include any summary statistics table or figure?

Your replication should still include one. You should come up with one yourself.

  1. Help, all the replication files are in Stata and I don’t know how to read them!

Some papers will already have replication files available, however, you are not expected to simply translate them from Stata or Matlab into R. So not knowing Stata should not be a barrier to replicating a paper. Instead you should do what the authors say they do in the paper. That said, reviewing the original replication files is often helpful for understanding how the variable descriptions in the paper correspond to the actual dataset and how the data is structured.

Assessment Marking

Core ERM is a pass/fail course. To pass overall, you need to achieve a weighted average mark of 60 across your assessments:

\[\text{Overall mark} = \frac{2}{3} \times \frac{\text{PS1} + \text{PS2} + \text{PS3} + \text{PS4}}{4} + \frac{1}{3} \times \text{Mini Project}\]

Marking Criteria

Assignments in Core ERM are marked based on five criteria:

  1. Clean Code: Your R code must adhere to the tidyverse style guide. It should be clean, easy to read, and appropriately commented.
  2. Correct Code: Your R code must be syntactically correct, i.e. it must run without errors. This will be assessed based on your ability to successfully knit an RMarkdown/Quarto file with your results: your file will not knit unless the code is correct. More details on RMarkdown/Quarto will be provided in lectures.
  3. Formatting & Typesetting: For a given assignment, you will submit a single pdf document constructed from one or more underlying RMarkdown/Quarto reports incorporating your code and detailing your solutions to the questions on the assessment. Your write-ups should be clearly formatted using appropriate markdown commands. Any mathematical formulas that you incorporate should be clearly and cleanly typeset using appropriate LaTeX commands.
  4. Completeness: To pass a given question on a course assignment, your answer must at a minimum be substantially complete. This means addressing all parts of the question and providing all requested deliverables (e.g., graphs, tables, numerical calculations). Partial solutions only receive partial marks, regardless of quality.
  5. Quality: To pass a given question on the assignment, your answer must be substantially correct. This means providing accurate calculations, appropriate interpretations, and clear explanations of your methodology and findings. Poorly explained or substantially incorrect answers will only receive partial marks. If I can’t tell that you understand what you’re doing, you will not pass a given question.

Grading Rubric

Each assignment will be graded according to this rubric:

Mark Level Grading Rubric
90 Exceptional Pass The assignment is perfect. All parts are present, correct, and fully satisfy the marking criteria.
70 High Pass All parts of the assignment are present, mostly correct, and fully satisfy the marking criteria.
60 Pass All parts of the assignment are present, but there are some noticeable errors and/or deviations from marking criteria 1–3 above.
49 Marginal Fail The assignment is mostly complete but may be missing some components and/or there are more serious errors and deviations from marking criteria 1–3 above.
40 Fail The assignment is mostly incomplete and/or fundamentally incorrect and/or completely ignores marking criteria 1–3 above.
0 No submission  

We will deduct marks for late submissions following the MPhil Economics Exam Conventions:

Lateness Cumulative Penalty
After the deadline but on the same day 5 marks
Each additional day (e.g. the day after the deadline = 6 marks, the day after that = 7 marks; note that Saturday and Sunday count the same as weekdays) 1 mark
Maximum deducted mark (up to 14 calendar days late) 18 marks
More than 14 calendar days late Fail

This ensures that you won’t fail an assignment if you submit late.

Academic Integrity

Consulting Human Beings

You are allowed, and indeed encouraged, to discuss course problems and assignments with your classmates and GTAs, but you are not allowed to directly copy code or results from another student. The work that you submit for assessment must be your own, even if it incorporates suggestions from your classmates and GTAs.

AI policy

Core ERM teaches you how to code. It also teaches you how to code with AI.

Consulting AI for Problem Sets

You can consult LLMs in the same way that you are free to consult your classmates and GTAs, e.g. as a tool to help you learn R, help debug code, and so on. But you are not allowed to paste in problem set questions and ask for solutions. For example, asking “Can you explain how to filter rows in dplyr?” is acceptable, while asking “How would I solve question 3 from problem set 2?” is not permitted. For the same reason, you are not permitted to use AI agents or tools that autocomplete code as you type–e.g. GitHub Copilot–when completing problem sets.

Generative AI can most likely generate correct solutions to all of my problem set problems, so you may find yourself sorely tempted. There are two reasons why you should not succumb. First, perfectly correct solutions generated by ChatGPT and Claude look sufficiently dissimilar to the examples that I provide in my course materials that it is extremely easy for me to tell that they were AI-generated. Second, if you rely solely on AI, you will never learn to code.

We want you to develop intution for data and become independent thinkers. To do that, you need to think through the problem sets. If you remove any difficulty by reaching for AI-generated solutions, you are removing all learning opportunties.

Socratic AI Tutor

To help you use LLMs for advice and studying, I have built a Socratic AI Tutor under Oxford’s ChatGPT Edu license. It has instructions to not give out full solutions and instead help you break down problems, reason through them step by step, and develop intuition in programming and causal inference methods. This is a trial run - please report any unexpected behaviour on Ed under “Socratic Tutor”!

Coding with AI

I insist that you learn to code, but I also insist that you learn to use AI. For this reason, we will teach you how to use AI agents in week 7.

Consulting AI for Mini Projects

On your mini-project you are free to use generative AI however you see fit: there are no restrictions whatsoever. But please bear in mind that any code you submit must adhere to my Marking Criteria.

Inspera Submission Requirements

Re-sits

Barring a serious personal issue that affects your studies, there is no reason why you should fail Core ERM. If you attend class, participate actively, and get help at the drop-in surgeries as needed, you will develop all of the skills needed to complete the course assignments to the appropriate standard.

If for some reason you do fail Core ERM, you can re-sit any failed assignments over the summer. You will need to re-submit your assessment by noon Wednesday 16 August, 2026 (tbc at July exam board meeting). Affected students will be contacted by the Academic Office after the July exam board meeting. (Remember: you need to pass all four problem sets and your mini-project to pass the course.)

Draft Book

When Frank first taught this course back in 2022, he started writing a book to accompany it. You can view his ten draft chapters at https://empirical-methods.com. The lecture slides will be the final authority on the course material in the present version of Core ERM.

Auditing Core ERM

Between 80 and 90 students take Core ERM for credit each year, but the MRB lecture theatre seats 120. Provided that there’s space left in the room, any member of the university is most welcome to attend my lectures without asking for permission in advance. I ask only that you respect the following guidelines. First, please sit in the back row if you’re auditing so that I can more easily gauge attendance etc. Second, the Drop-In Surgeries are only for students who are taking the course for credit. Third, Core ERM lectures are fairly interactive: I and the GTAs will circulate to help students who encounter difficulties while working on the exercises. I won’t go so far as to say that we won’t help you if you’re auditing, but we will need to prioritize the students who are taking the course for credit.