# Prelims Probability and Statistics

Lady Margaret Hall

## What is this?

This document details our college teaching arrangements for prelims probability and statistics at Lady Margaret Hall. It will be updated throughout the academic year, so make sure to check back regularly. The date of the current version is given above.

## Overview

I’ll teach Prelims Probability & Statistics at LMH in seven classes over the course of Michaelmas and Hilary Terms and two revision classes in Trinity Term. Because these are classes, all first-year PPE students will attend the same sessions. Each class will begin with a quiz, but there is no work to submit in advance. I expect you to attend all classes and to arrive on time. If you are unwell and unable to attend, please let me know by email.

## Dates, Times and Locations

**Michaelmas Term:**Thursdays of Weeks 3, 5, and 7 from 3-5:30pm in the Amanda Foreman Room**Hilary Term:**Thursdays from 3-5:30pm in Weeks 2 (Lyttleton Room), 4 (Amanda Foreman Room), 6 (Amada Foreman Room), and 8 (Lyttleton Room)**Trinity Term:**Two revision classes, times and location TBC.

## Read This First

Before reading further, log onto canvas and read the Overview and Instructions for prelims probability and statistics to you understand how the course is structured.

## Collections

You will have two 90-minute collections (mock exams) for prelims probability and statistics: one at the beginning of HT covering probability, and another at the beginning of TT covering all of the course material. I will provide further details in class.

## Quizzes

Each class will begin with a short, closed-book, closed-notes quiz that I will mark and return to you at the next class. In-class quizzes will *always* include a random sample from the **Review Questions** that I assigned for that class. (You can find these under Class Details below.) Because you are given these questions in advance, you should be able to answer them correctly on the quiz. If you cannot, this means that you have not come to class adequately prepared. Quizzes may also include additional questions based on the material covered in past classes. These questions will be harder and you will not be given them in advance. They are designed to help you practice for collections and the real exam.

## Before Each Class

- Watch the lecture videos listed below under Class Details.
- Take notes
- Note down key definitions and formulas.
- Make flashcards and memorize them using Anki.

- Solve the
**Review Questions**listed below under Class Details. - Repeat as needed:
- If you have trouble with the review questions, this means that you need to go back to the lecture videos and slides before attempting them again.
- Working with friends is highly recommended!
- Make a note of anything that you find confusing and bring it with you to class.

## After Each Class

Complete the **Problem Set Questions** listed below under Class Details. Then watch the associated Demonstration Lecture for solutions and explanations. Your problem set questions are neither collected nor marked, but they will serve as inspiration for future quiz and collections problems.

## Class Material

As explained in more detail above: watch the videos and solve the review questions *before* class; complete the problem set and watch the associated demo lecture *after* class.

### Class #1 - MT Week 3

#### Videos

- P0: Introduction
- P1: Probability Basics
- P2: Conditional Probability
- P3: Independence & Bayes’ Rule

#### Review Questions

- What is a probability?
- State each of the three axioms of probability, aka the Kolmogorov Axioms.
- Suppose we carry out a random experiment that consists of flipping a fair coin twice.
- List all the basic outcomes in the sample space \(\Omega\).
- Let \(A\) be the event that you get at least one head. List all the basic outcomes in \(A\).
- List all the basic outcomes in \(A^c\).
- What is the probability of \(A\)? What is the probability of \(A^c\)?

- State the complement rule.
- Mark each statement as TRUE or FALSE.
- If \(A \subseteq B\) then \(P(A) \geq P(B)\).
- For any events \(A\) and \(B\), \(P(A \cap B) = P(A)P(B)\)
- For any events \(A\) and \(B\), \(P(A \cup B) = P(A) + P(B) − P(A \cap B)\)

- Let A and B be two arbitrary events.
- Show that \(P(A \cup B) \leq P(A) + P(B)\). (This is called
*Boole’s Inequality*.) - Show that \(P(A \cap B) \geq P(A) + P(B) − 1\). (This is called
*Bonferroni’s Inequality*)

- Show that \(P(A \cup B) \leq P(A) + P(B)\). (This is called
- State the definition of a conditional probability.
- Derive:
- Bayes’ Rule from the definition of conditional probability.
- The Multiplication rule for independent events from the definition of conditional probability.

- Name the various components of Bayes’ Rule.
- Suppose that \(P(B) = 0.4\), \(P(A|B) = 0.1\) and \(P(A|B^c) = 0.9\).
- Calculate \(P(A)\).
- Calculate \(P(B|A)\).

- Let \(A\) and \(B\) be two mutually exclusive events both with positive probability. Are they independent? Explain.
- Alexis the meteorologist determines that the probability of rain on Saturday is 50%, and the probability of rain on Sunday is also 50%. Sally the presenter sees Alexis’ forecast and summarizes it as follows: “According to Alexis we’re in for a wet weekend. There’s a 100% chance of rain this weekend: 50% on Saturday and 50% on Sunday.” Is Sally correct? Why or why not?
- When is it true that \(P(A|B) = P(B|A)\)? Explain.

#### Problem Set Questions

*Solutions to these questions are provided in Demonstration Lecture #1*.

- Prove, using the probability axioms
**The Complement Rule**: \(P(A^c) = 1 − P(A)\).**The Probability of the Union of Two Events Rule**: \(P(A\cup B) = P (A) + P (B) − P (A\cap B)\)**The Bounds on Probabilities Rule**: \(P (A \cup B) \leq P (A) + P (B)\)**The Logical Consequence Rule**: If \(B\) logically entails \(A\) then \(P (B) \leq P (A)\)

- Consider the experiment of tossing a coin repeatedly and counting the number of coin tosses required until the first head appears.
- Write down the sample space.
- Let \(A\) be the event that the number of coin tosses required for the first head to appear is even. Let \(B\) be the event that the number of coin tosses required until the first head appears is less than 5. Describe events \(A\) and \(B\) as sets.

- Consider the experiment of tossing a coin 2 times. Consider the events: \[
\begin{align*}
A &= \text{There is at most one Head}\\
B &= \text{There is at least one Head and there is at least one Tail}
\end{align*}
\]
- Are events \(A\) and \(B\) independent?
- What if you toss the coin 3 times?
- What if you toss the coin 4 times?

- Derive Bayes’ Rule from the definition of a conditional probability.
- Three percent of Tropicana brand oranges are already rotten when they arrive at the supermarket. In contrast, six percent of Sunkist brand oranges arrive rotten. A local supermarket buys forty percent of its oranges from Tropicana and the rest from Sunkist. Suppose we randomly choose an orange from the supermarket and see that it is rotten. What is the probability that it is a Tropicana?
- Imagine there are two universities, University A and University B, who take different approaches to generating research findings. In University A, a team of well-informed experts develop a theory. Their theories are correct 90% of the time. Before publishing a theory, the scientists at University A do an experimental test of the theory to check whether it is correct (e.g. a clinical trial). The test is designed so that 90% of correct theories will pass the test and only 10% of false theories pass the test.
- Let \(T\) be the event that the theory is true and \(\text{Pub}\) be the event that the theory passes the test and is published. Draw a Venn diagram to represent these probabilities.
- Calculate the probability that a theory from University A is published, i.e \(P(\text{Pub})\).
- Calculate the probability that a published theory from University A is correct, i.e \(P(T |\text{Pub})\).
- In University B, a team of creative experts think of theories that would be rather surprising and interesting if true. These theories are correct only 5% of the time. Again, before publishing their theory, the scientists at University B do that same experimental test as in University A of the theory, i.e. the test is designed so that 90% of correct theories will pass the test and only 10% of false theories pass the test. Calculate the probability that a published theory from University B is correct.
- The Research Council governing the publication of research requires that published theories (i.e. those that pass the test) must be replicated before they can be used in practice. Assume that the replication test is like the first test, i.e. the test is designed so that 90% of correct theories will pass the test and only 10% of false theories pass the test. Compute the replicability rate of University A and University B (i.e. the pass rate of the second test)

### Class #2 - MT Week 5

#### Videos

- D0: Data & Variables Intro
- D1: Data Basics
- D2: Probability Mass Functions
- D3: PDFs
- D4: CDFs
- D5: Standardization
- D6: Multivariate Distributions

#### Review Questions

*Note: your lecturer uses the term “variable” where the more standard term is “random variable.” I’ll always say random variable, and abbreviate it RV.*

- Define the “support” aka “support set” of a RV. What is the probability that a RV takes on a value outside of its support set?
- What is the difference between a discrete and continuous RV?
- What is a probability mass function (PMF)? What key properties does it satisfy?
- A random variable is said to follow a “Rademacher distribution” if it is equally likely to take any value in the set \(\{−1, +1\}\) and never takes on any value outside this set. Write out and sketch its probability mass function.
- Define the term cumulative distribution function (CDF).
- How is the CDF of a discrete RV related to its PMF?
- How is the CDF of a continuous RV related to its PDF?

- Let \(X\) be a RV with support set \(\{−1, +1\}\) and \(p(−1) = 1/3\). Write down the CDF of \(X\).
- Define the term parameter as it relates to a distribution. Are parameters constant or random?
- If \(X\) is a continuous RV and \(a, b\) are constants, how do we calculate \(P(a \leq X \leq b)\)?
- What are the key properties of a probability density function (PDF)?
- True or False: if \(f(x)\) denotes a PDF, then \(f(x)\) is a probability so \(0 \leq f(x) \leq 1\).
- Let \(X\) be a continuous RV with CDF \(F\) . Express \(P(−2 \leq X \leq 4)\) in terms of \(F\).
- Let X be a continuous RV with CDF \(F\). Express \(P(X \geq x)\) in terms of \(F\).
- Suppose that \(X\) is a continuous RV with support set \([−1, +1]\).
- Is 2 a possible realization of this variable?
- What is \(P(X = 0.5)\)?

- True or False: If \(X\) is a continuous RV then \(P(X \leq 0.3) = P(X < 0.3)\). Explain.
- What does it mean to “standardize” a RV?
- Let \(X\) be a \(\text{Uniform}(0, 1)\) RV. Write down the CDF of \(X\).
- In a \(N(\mu, \sigma^2)\) distribution, what features of the the distribution do the parameters \(\mu\) and \(\sigma^2\) control?
- Suppose that \(X \sim N(\mu, \sigma^2)\) Approximately what are the values of the following probabilities?
- \(P(\mu − \sigma \leq X \leq \mu + \sigma)\)
- \(P(\mu − 2\sigma \leq X \leq \mu + 2\sigma)\)
- \(P(\mu − 3\sigma \leq X \leq \mu + 3\sigma)\)

- Let \(X \sim N(\mu = −2,\sigma^2 = 25)\). Without consulting a table, what is the approximate value of \(P(−12 \leq X \leq 8)\)?
- Let \(X \sim N(0, 1)\). Calculate the following:
- \(P(X = 0)\)
- \(P(X ≤ −1)\)
- \(P(−1.2 \leq X \leq 1.2)\)
- the value of \(t\) such that \(P(X \leq t) = 0.5\)
- the value of \(t\) such that \(P(X > t) = 0.80\)
- the value of \(t > 0\) such that \(P(−t \leq X \leq t) = 0.9\)
- the value of \(t > 0\) such that \(P(−t \leq X \leq t) = 0.82\)

- Let \(X \sim N (\mu = −2, \sigma^2 9)\). Calculate the following:
- \(P(X = 0)\)
- \(P(X \leq −1)\)
- \(P(−1.2 \leq X \leq 1.2)\)
- the value of \(t\) such that \(P(X \leq t) = 0.5\)
- the value of \(t\) such that \(P(X > t) = 0.80\)
- the values of \(t_0\) and \(t_1\) (symmetric around the mean) such that \(P(t_0 \leq X \leq t_1) = 0.9\).
- the values of \(t_0\) and \(t_1\) (symmetric around the mean) such that \(P(t_0 \leq X \leq t_1) = 0.82\)

- Suppose that \(U \sim N (0, \sigma^2)\) and let \(a, b, x\) be constants. Find the distribution of each of the following:
- \(Y = a + U\)
- \(Y = a + bU\)
- \(Y = a + bx + U\)

- Let \(X \sim N(\mu = 1, \sigma^2 = 2)\).
- What is the 1st quartile of \(X\)?
- What is the 77th percentile of \(X\)?
- What is the median of \(X\)?

- Define the following terms:
- conditional distribution
- marginal distribution

- Consider the following bivariate PMF \(p(X, Y)\) \[
\begin{array}{l|cccc}
& X = 0 & X=2 & X = 3 & X = 4\\
\hline
Y=0& \boxed{?} & 2/24 & 2/24 & 1/24\\
Y=1& 1/24 & 3/24 & 4/24 & 2/24\\
Y=2& 1/24 & 3/24 & 2/24 & 2/24\\
\end{array}
\]
- Fill in the missing value: \(\boxed{?}\).
- Write out the marginal PMF of \(X\).
- Write out the marginal CDF of \(Y\).
- Write out the conditional PMF of \(Y\) given \(X=3\).
- Write out the conditional CDF of \(X\) given \(Y=2\).
- Are \(X\) and \(Y\) independent? Explain.

#### Problem Set Questions

- Consider a random variable \(X\) which is uniformly distributed \(U(0, 1)\).
- Draw the probability density function (pdf), \(f(x)\). What is the formula for the pdf?
- Draw the cumulative density function (cdf), \(F(x)\). What is the formula for the cdf?

- Consider a random variable \(X\) which is uniformly distributed \(U(0, 1)\).
- What is the distribution of \(Z = 2 + X\)?
- What is the distribution of \(Z = 3X\)?
- What is the distribution of \(Z = 2 + 3X\)?

- Consider a random variable X which is uniformly distributed \(U(−2, 3)\).
- What is the distribution of \(Z = 2 + X\)?
- What is the distribution of \(Z = 3X\)?
- What is the distribution of \(Z = 2 + 3X\)?

- For a random variable $X U (2, 12) (read the \(\sim\) as “that is distributed”)
- What is \(P(3 \leq X \leq 8)\)?
- What is \(P(X = 9)\)?

- Consider a random variable X which is normally distributed with mean 0 and variance of 1.
- What is the distribution of \(Z = 5 + X\)?
- What is the distribution of \(Z = 2X\)?
- What is the distribution of \(Z = 2 + 3X\)?

- Consider a random variable \(X\) which is normally distributed with mean 2 and variance of 9.

- What is the distribution of \(Z = −2 + X\)?
- What is the distribution of \(Z = X/\sqrt{9}\)?
- What is the distribution of \(Z = (−2+X)/\sqrt{9}\)

- If \(X \sim∼ N (\mu=−1, \sigma^2 =2)\) what is
- \(P(X = 0)\)?
- \(P(X \leq −1)\)?
- \(P(−1 \leq X \leq 1)\)?
- the value of \(t\) such that \(P(X \leq t) = 0.25\)
- the value of \(t\) such that \(P(X > t) = 0.80\)
- the value of \(t\) such that \(P(−|t| \leq X \leq |t|) = 0.9\)

- Suppose that students’ marks on the economics prelims paper are normally distributed with mean 61 and standard deviation 9.5.
- What is the probability that a student scores (i) less than 50? (ii) 70 or more?
- What score is exceeded by only 10% of students?
- Find the median, and the upper and lower quartiles.
- What proportion of students have scores within 5 marks of the mean?

- Consider the following bivariate probability mass function \(f(X, Y )\) \[
\begin{array}{l|ccc}
& X=0 & X=2 & X=3 \\
\hline
Y=0&1/36 & 1/18 & 1/12\\
Y=1& 1/9 & 2/9 & 1/3\\
Y=2& 1/36 &1/18 & 1/12
\end{array}
\] what is
- the marginal PMF of \(X\)?
- the marginal CDF of \(Y\)?
- the conditional PMF of \(Y\): \(f(Y |X = 3)\)?
- the conditional CDF of \(X\): \(F(X|Y = 2)\)?
- Are \(X\) and \(Y\) independent?

### Class #3 - MT Week 7

#### Videos

- M0: Intermission
- M1: Expected Values
- M2: Variance
- M3: Conditional Expectations
- M4: Correlation, Covariance, & Independence

#### Review Questions

- Consider the population of Oxford undergraduates. Let \(X\) be a random variable that represents the distribution of height in centimeters in this population, and \(Y\) be a random variable that represents the distribution of weight in kilograms. What are the units of the following quantities?
- \(\mathbb{E}(X)\)
- \(\text{Var}(Y)\)

- The standard deviation of \(X\)
- \(Z \equiv \left[X - \mathbb{E}(X)\right] / \sqrt{\text{Var}(X)}\)
- \(\text{Cov}(X,Y)\)
- \(\text{Cor}(X,Y)\)

- Give an example of each:
- A convex function
- A concave function

- Let \(X\) be a random variable with support set \(\left\{0, 1, 2 \right\}\), \(p(1) = 0.3\), and \(p(2) = 0.5\). Calculate \(\mathbb{E}[X]\).
- Suppose that \(X \sim N(\mu, \sigma^2)\).
- What is \(\mathbb{E}[X]\)?
- What is \(\text{Var}(X)\)?

- Let \(X\) be a discrete random variable that is equally likely to take on the values \(2\) and \(4\) and never takes on any other values. Prove or disprove the following claims, either by using a property of expected value that you have learned, or by direct calculation:
- \(E(X+10)= E(X) + 10\)
- \(E(X/10) = E(X)/10\)
- \(E(10/X)=10/E(X)\)
- \(E(X^2) = [E(X)]^2\)
- \(E(5X + 2)/10 = [5E(X) + 2]/10\)

- State Jensen’s Inequality.
- Give the formula for each of these quantities in terms of expectations:
- Variance
- Covariance
- Correlation

- \(X\) be a binary variable where \(p = \mathbb{P}(X=1)\) is the proportion of ``ones’’ in the population. Write down the expressions for the following quantities in terms of \(p\):
- \(\mathbb{E}(X)\)
- \(\text{Var}(X)\)

- Let \(X\) be a binary variable where \(p = \mathbb{P}(X=1)\) is the proportion of “ones” in the population.
- Show that \(\mathbb{E}(X) = p\)
- Show that \(\mathbb{E}(X^2) = p\)
- Combine the preceding to parts to show that \(\text{Var}(X) = p(1 - p)\).

- \(X\) be a discrete random variable.
- Write down the definition of the expected value \(\mathbb{E}[X]\) of \(X\).
- Is \(\mathbb{E}[X]\) constant or random? Explain why in one sentence.

- What is a conditional expectation?
- Formally state the Law of Iterated Expectations.
- TRUE or FALSE: the following statements are equivalent
- \(X\) and \(Y\) are independent.
- The covariance between \(X\) and \(Y\) is zero.

- Suppose that \(\text{Var}(Y) = 0\) and \(\text{Var}(X) > 0\). Calculate \(\text{Cov}(X,Y)\).
- Suppose \(X\) is a random variable with support \(\{-1, 0, 1\}\) where \(p(-1)=q\) and \(p(1) = p\). What relationship must hold between \(p\) and \(q\) to ensure that \(E[X] = 0\)?
- Suppose that \(\mathbb{E}[X]=8\) and \(Y= 3 + X/2\). Calculate \(\mathbb{E}[Y]\).
- Suppose that \(X\) is a discrete random variable and \(g\) is a function. Explain how to calculate \(\mathbb{E}[g(X)]\). Is this the same thing as \(g\left(\mathbb{E}[X]\right)\)?
- Let \(X\) be a random variable with support set \(\left\{ -1, 1 \right\}\) and \(p(-1) = 1/3\). Calculate \(E[X^2]\).
- Suppose that \(X \sim N(\mu = -2, \sigma^2 = 3)\). What is \(\text{Cov}(X,X)\)?
- Let \(X\) and \(Y\) random variable with \(\mathbb{E}[X] = 2\) and \(\mathbb{E}[Y] = 1\). Calculate \(\mathbb{E}[X - Y]\).
- Let \(W = Y + Z\) and suppose that \(\text{Cov}(X,Y) = 0\) and \(\text{Cov}(X,Z) = 0\). What is the correlation between \(W\) and \(X\)?
- Suppose \(\mathbb{E}[X] = 2\) and \(\text{Var}(X) = 5\). Calculate \(\mathbb{E}[X^2]\).
- Let \(X\) and \(Y\) be random variable with \(\text{Var}(X) = 2\) , \(\text{Var}(Y) = 1\), and \(\text{Cov}(X,Y) = 0\). Calculate \(\text{Var}(X - Y)\).
- Let \(X\) and \(Y\) be two random variable with \(\text{Var}(X) = \sigma_X^2\), \(\text{Var}(Y) = \sigma_Y^2\), and \(\text{Cov}(X,Y) = \sigma_{XY}\). If \(a,b,c\) are constants, what is \(\text{Var}(cX + bY + a)\)?
- Suppose that \(X\) and \(Y\) are two random variables with correlation \(\rho = 0.3\), and standard deviations \(\sigma_X = 4\) and \(\sigma_Y = 5\).
- Calculate \(\text{Cov}(X,Y)\).
- Let \(Z = (X + Y)/2\). Calculate \(\text{Var}(Z)\).

- Mark each statement as TRUE or FALSE. If FALSE, explain.
- The expected value of a sum \(\mathbb{E}[X + Y]\) does in general equal the sum of the expected values \(\mathbb{E}[X] + \mathbb{E}[Y]\); this only holds when \(X\) and \(Y\) are independent.
- The variance of a sum \(Var(X + Y)\) is always equal to the sum of the variances \(Var(X) + Var(Y)\).

- Let \(X\) be a Uniform\((0,1)\) random variable.
- Calculate \(\mathbb{E}(X)\).
- Calculate \(\mathbb{E}(X^2)\).
- Combine the preceding two parts to find \(\text{Var}(X)\).

- Suppose that \(X \sim N(\mu, \sigma^2)\). Approximately what are the following probabilities? (a) \(\mathbb{P}(\mu - \sigma \leq X \leq \mu + \sigma)\) (b) \(\mathbb{P}(\mu - 2\sigma \leq X \leq \mu + 2\sigma)\) (c) \(\mathbb{P}(\mu - 3\sigma \leq X \leq \mu + 3\sigma)\)
- Suppose that \(X \sim N(0,1)\). Calculate \(\mathbb{E}[X^2]\).
- Let \(X\) and \(U\) be random variables with \(\mathbb{E}(U|X)=0\) and \(\mathbb{E}(X) = 5\). Let \(a\) and \(b\) be constants, and further define \(Y = a + bX + U\).
- Show that \(\mathbb{E}(U) = 0\).
- Calculate \(\mathbb{E}(Y)\).
- Calculate \(\mathbb{E}(Y|X = 0)\).
- Suppose that \(\mathbb{E}(U) = 0\) but \(\mathbb{E}(U|X)\) is unknown. Would this make any difference to your answers in parts (b) and (c) above?

#### Problem Set Questions

- Let \(X\) and \(Y\) be random variables and \(a,b,c\) be constants. Use the linearity of expectation to derive the following results:
- \(\text{Cov}(X,Y) = \mathbb{E}(XY) - \mathbb{E}(X)\mathbb{E}(Y)\)
- \(\text{Var}(a + bX + cY) = b^2 \text{Var}(X) + c^2 \text{Var}(Y) + 2bc \text{Cov}(X,Y)\)

- Let \(X\) and \(Y\) be random variables such that \(\mathbb{E}[Y] \neq 0\). Show that, provided that all of the relevant expectations exist and are finite, \[\frac{\mathbb{E}[X]}{\mathbb{E}[Y]} - \mathbb{E}\left[ \frac{X}{Y}\right] = \frac{\text{Cov}(X/Y,\, Y)}{\mathbb{E}[Y]}.\]
- Let \(X\) be a random variable and \(a, b\) be constants. Suppose that \(Y = a + b X\). Calculate the correlation between \(X\) and \(Y\) in each of the following cases:
- \(b > 0\)
- \(b < 0\)
- \(b = 0\)

- Let \(X\) and \(U\) be random variables and \(a, b\) be constants. Suppose that \(Y = a + b X + U\) where \(\mathbb{E}(U) = 0\) and \(\text{Cov}(X, U) = 0\).
- Calculate \(\mathbb{E}(Y)\).
- Calculate \(\text{Var}(Y)\).
- Show that \(\text{Cov}(X, Y) = b \text{Var}(X)\)
- Show that \(\text{Corr}(X,Y) = b \sqrt{\frac{\text{Var}(X)}{b^2 \text{Var}(X) + \text{Var}(U)}}\).

- Let \(X\) be a \(\text{Bernoulli}(p)\) random variable, i.e. \(X \in \{0, 1\}\) with \(\mathbb{P}(X=1) = p\). Show that, for any other random variable \(Y\) such that the relevant expectations exist and are finite \[ \frac{\text{Cov}(X,Y)}{\text{Var}(X)} = \mathbb{E}(Y|X=1) - \mathbb{E}(Y|X=0). \]

### Class #4 - HT Week 2

#### Videos

- S0: Statistics Intro
- S1: Populations, Samples & Random Variables
- S2: Parameters, Statistics & Estimation
- S3: The Law of Large Numbers & Central Limit Theorem
- S4: Intro to Statistical Inference
- S5: Confidence Intervals Overview
- S5a: Confidence Intervals for Means
- S5b: Confidence Intervals for Proportions

#### Review Questions

- Define the following terms and give a simple example: “population”, “sample”, “sample size”.
- Explain the distinction between a parameter and a statistic.
- Give examples of a (i) parameter which is also a moment (ii) a parameter which is not a moment.
- Briefly compare and contrast sampling and non-sampling error.
- Define a simple random sample. Does it help us to address sampling error, non-sampling error, both, or neither?
- What is
- a “random variable”?
- a “statistic”?

- What does it mean for a statistic to have a sampling distribution?
- Define the following terms: “standard deviation”, “standard error”.
- Define the terms “estimand”, “estimator”, “estimate”, using the mean as an example.
- What does it mean for a set or sequence of random variables to be “iid”?
- If sampling is performed without replacement why are the resulting random variables not iid? Under what circumstances does it become, for all practical purposes, irrelevant as regards the iid property whether one draws with or without replacement?
- What does it mean for an estimator to be
- unbiased
- consistent

- Can an estimator be consistent yet biased? If so give an example, if not why not?
- Why do we divide by the sample size minus one when we estimate the sample variance?
- Suppose the population mean μ is known and we want to estimate the population variance \(\sigma^2\). Is \(\widehat{\sigma}^2 \equiv \frac{1}{n-1} \sum_{i=1}^n (X_i - \mu)^2\) an unbiased estimator of \(\sigma^2\)? Justify your answer.
- Define the concept of efficiency of an estimator. What does it mean to say that one estimator is more efficient than another?
- State (formally)
- the central limit theorem;
- the law of large numbers.

- TRUE or FALSE: the Central Limit Theorem says that large populations are approximately normally distributed. If FALSE, correct the statement.
- Give the formal definition of a confidence interval and the “rough intuition” that we can use to interpret it.
- How does the width of a confidence interval for a population mean vary with
- the sample size?
- the confidence level?
- the sample variance?
- the sample mean?
- the standard error of the sample mean?

- Write out the formula for the 95% confidence interval for
- a population mean,
- the difference between the means of two independent populations,
- the difference between the population means of a pair of variables from a single population.

- Explain briefly how your answers to the preceding question would chage if the random variables in question are binary.
- How many standard errors to either side of the mean do you need to go in order to construct a
- 90% confidence interval,
- 95% confidence interval,
- 99% confidence interval?

- Using the appropriate statistical tables, how many standard errors to either side of the mean do you need to go in order to construct an 80% confidence interval?

#### Problem Set Questions

- Define the terms “estimand”, “estimator”, and “estimate” using the median as an example.
- Let \(X_1, X_2, \dots X_N \sim \text{iid}\) with mean \(\mu\) and variance \(\sigma^2\) and let \(\bar{X}_N \equiv \frac{1}{N} \sum_{i=1}^n X_i\).
- Is \(\bar{X}_N\) and unbiased estimator of \(\mu\)?
- Is \((0.1 X_1 + 0.9 X_2)\) an unbiased estimator of \(\mu\)? Why or why not?
- Which is a more efficient estimator of \(\mu\): \((0.1 X_1 + 0.9 X_2)\) or \((0.5 X_1 + 0.5 X_2)\)? Explain.
- Suppose that \(\mu\) is known. Is \(\widehat{\sigma}^2 \equiv \frac{1}{N-1}\sum_{i=1}^N (X_i - \mu)^2\) an unbiased estimator of \(\sigma^2\)? Explain.

- Consider a fictional university, Camford University, which is composed of a large number of colleges. Each college has the same number of first-year economics students. Suppose that students’ marks on the economics prelims papaer are normally distributed with mean 61 and standard deviation 9.5.
- What is the distribution of the sample mean in a random sample of size \(N\)?
- In a random sample of 10 students, what is the probability that their average mark exceeds 63?
- Suppose that you have a sample of 10 students that is selected by choosing a
*college*at random, and then choosing 10 students at random from the college. What can you say about the expected value and variance of their average mark? Explain.

### Class #5 - HT Week 4

#### Videos

- S6: Tests of Statistical Hypotheses - Overview
- S6a: Tests for Means
- S6b: Tests for Proportions
- S7: Confidence Intervals and Hypothesis Tests Revisited

#### Review Questions

- What is a “Null Hypothesis”?
- Define Type I and Type 2 errors.
- If you set a significance level of \(\alpha = 0.05\) what proportion of the time would you expect to falsely reject the Null hypothesis?
- If the significance level of a test is 0.15, and you decide to reject the Null, at what level of confidence are you doing so? What are you being confident about?
- Given a test statistic with a Standard Normal distribution, what is the “rule of thumb” critical value for rejecting the Null at a 5% significance level? What is the “rule of thumb” rejecting it at a 10% significance level?
- For a normally distributed test statistic, use the appropriate statistical tables to find the critical value for a (two-sided) test with each of the following significance levels:
- 6%
- 17%

- Write out the formula for the test statistic for a test of
- whether a population mean is equal to a certain value.
- whether the difference in means in two independent populations is equal to a certain value.
- whether the difference between the population means of a pair of variables from a single population are the same.

- Repeat the preceding problem for the case for the case of binary random variables.

#### Problem Set Questions

- In a random sample of 1165 Oxford PPE applicants, the average score on the TSA test was 60.86, with a standard deviation of 8.02. Construct a 95% confidence interval for the population mean score.
- Suppose you are studying the impact of COVID-19 on the labour market. In a random sample of 1000 workers, you find that 354 had been “furloughed” in July 2020. Construct a 90% confidence interval for the population mean furlough rate.
- Suppose that we observe hourly wages for a random sample of 150 university graduates: the sample mean is £31 with a standard deviation of £15. In contrast, the sample mean wage is £17 with a standard deviation of £10 for a sample of 225 non-university graduates. Construct an approximate 95% confidence interval for the difference in population mean wages \((\mu_X - \mu_Y)\) between university graduates (X) and non-university graduates (Y).
- You are interested in whether the average rent for a room in Oxford is £500 per month. In a random sample of 300 rooms to rent in Oxford, the average rent is £525 per month with a standard deviation of £200. Test the hypothesis that the average Oxford room rent per month is £500.
- Suppose you have a large random sample \(X_1, \ldots, X_n \sim \text{iid} \, N(\mu, \sigma^2)\), and you test \(H_0: \mu = \mu_0\) vs. \(H_1: \mu \neq \mu_0\) where \(\mu_0\) is some hypothesized value of \(\mu\). You decide to set a significance level, \(\alpha\), at which to conduct the hypothesis test.
- Say you choose \(\alpha = 0.05\) and reject. Would you still have rejected if you had instead chosen \(\alpha = 0.1\)? Explain.
- Say you choose \(\alpha = 0.01\) and fail to reject. Would you have failed to reject if you had instead chosen \(\alpha = 0.05\)? Explain.

- A random sample of 35 Netflix users are asked to rate two shows:
*The Queen’s Gambit*and*Bridgerton*. You can access the dataset as a .csv file from this url. Use the data to test the null hypothesis that both shows are equally highly rated. - You are interested in whether the Oxford unemployment rate is the same as the national unemployment rate (5%). In a random sample of 1,000 Oxford residents, you find that 63 are unemployed. Test the hypothesis that the Oxford unemployment rate is the same as the national rate.
- Consider two groups A and B composed of 100 people who are suffering from a disease. A drug is given to group A. A placebo is given to group B. It is found that in group A 75 people recover whereas in group B only 65 people recover.
- Test the hypothesis that the drug helped cure the disease.
- Now suppose that the sample sizes in each group are 300 and 225 in group A and 195 in group B recover. What do you conclude?

### Class #6 - HT Week 6

#### Videos

- T0: Time Series Introduction
- T1: Time Series Data
- T3: Basic Operations
- T4: Stationarity
- T5: Some Time Series Models
- T6: Mean Reversion
- T7: Spurious Correlation

#### Review Questions

- What is the key difference between time series data and cross-section data?
- List, and be prepared to describe, the typical features exhibited by time series data.
- Define formally (write down the formula) for the population autocorrelation between \(X_t\) and \(X_{t-h}\).
- Sketch the typical autocorrelation functions for
- a time series with no autocorrelation,
- a quarterly times series with annual seasonality,
- a strongly persistent time series.

- Define “weak stationarity”.
- Why is stationarity important?
- If a time series is not stationary, how might you be able to transform it so that it is? Give three examples.
- Define formally (write down the formulae) for each of the following:
- an AR(1) process
- a Random Walk with Drift
- a Random Walk
- a White Noise process

- Is a Random Walk a stationary process? If so why? If not why not?
- Is it possible for an AR(1) process to be stationary?
- Define formally (write down the formula) for \(E [X_t|X_{t-1}]\) for a mean-reverting AR(1) process.

#### Problem Set Questions

- At this url you can find a plot illustrating the change in global surface temperatures over time, relative to the 1951-1980 average. With reference to the plot, discuss the key differences between time series and cross-sectional data.
- If \(X_t\) is white noise with mean \(\mu\) and variance \(\sigma^2\), what process does its first difference (\(\Delta X_t = X_t - X_{t-1}\)) follow? What is the mean, variance, first order autocovariance, and second order autocovariance of the process?
- Consider the following time series process: \[X_t = \epsilon_t + 0.5\epsilon_{t-1} + 0.5\epsilon_{t-2}\] where \(\epsilon_t\) is identically and independently distributed where \(\epsilon_t \sim N(0, \sigma^2)\).
- What is the mean and variance of \(X_t\)?
- What are the first, second, and third order autocovariances?

- Let \(\epsilon_t\) be identically and independently distributed with \(\epsilon_t \sim N(0, \sigma^2)\). Consider the following time series process: \[X_t = \alpha + \beta X_{t-1} + \epsilon_t\]
- What is the name commonly given to this type of process? What if \(\alpha = 0\) and \(\beta = 1\)? What if \(\alpha = 0\) and \(\beta = 0\)?
- By recursing back one period, show that: \[X_t = \alpha + \beta \alpha + \beta^2 X_{t-2} + \epsilon_t + \beta \epsilon_{t-1} \quad (3)\]
- By using the results on sequences covered in Chapter 4 of the Maths Workbook, show that \[ X_t = \frac{1 - \beta^t}{1 - \beta} \alpha + \beta^t X_0 + \sum_{j=0}^{t-1} \beta^j \epsilon_{t-j}\] Suppose \(0 \leq \beta < 1\) and \(X_0\) is a random variable with mean \(\frac{\alpha}{1 - \beta}\) and variance \(\frac{\sigma^2}{1 - \beta^2}\), and is uncorrelated with \(\epsilon_t\) for all \(t > 0\).
- Find the mean, variance and autocovariance function of \(X_t\) for any \(t \geq 1\)? Is the process (weakly) stationary?

- In studies dating back over 100 years, it’s well established that “regression toward the mean” occurs between the heights of parent and the heights of their adult children. Indicate whether the following statement is true or false: “Parents of tall children will tend to be taller than their children.”

### Class #7 - HT Week 8

#### Videos

For this class there are *three sets of videos*: (1) my introduction to causal inference for a general audience, (2) the lecture videos:

- C0: Introduction
- C1: Potential Outcomes
- C2: Selection Bias
- C3: Randomized Controlled Trials
- C4: Internal and External Validity
- C5: Natural and Quasi-experiments
- C6: Conditional Independence

and (3) some additional videos of mine that complement the lecture videos:

- Potential Outcomes
- Selection Bias
- Conditional Independence (Skip 5:00-11:32 since it’s a bit advanced.)

#### Review Questions

- Define formally (write down the formulae)
- the average treatment effect (ATE);
- the average effect treatment on the treated (ATT).

- Define formally (write down the formula) the decomposition of the average difference between the treated and untreated groups into the ATT plus selection bias.
- How (in terms of the effect on the relevant conditional expectations) does randomisation of treatment solve the selection bias problem?
- What is meant by the terms
- “Internal validity”
- “External Validity”

- List the main threats to the internal validity of a study of treatment response.
- List the main threats to the external validity of a study of treatment response.
- What are likely to be the empirical effect of non-compliance on the calculation of a causal effect. Will it bias the result up or down?
- What is the conditional independence assumption?
- What are the three main problems encountered when using the conditional independence assumption.
- Is the conditional independence assumption verifiable?

#### Problem Set Questions

In “Long-Lasting Effects of Socialist Education” by Nicola Fuchs-Schundeln and Paolo Masella the authors study the effects of a socialist education amongst children educated in the former East Germany (GDR). Socialist teaching was an official school subject in the GDR from 7th grade on and aimed to provide a deep knowledge of Marxism-Leninism, and of the socialist system of the GDR. The authors also suggest that an important feature of this educational system was that “critical thinking was not incentivized and divergent opinions were suppresse”. The authors were interested in the causal effect of exposure to the GDR’s educational system on subsequent labour market outcomes. To study this they exploit the very rapid reorganisation of the school system in the East after the fall of the Berlin Wall (November 9, 1989). The socialist elements of the curriculum were dropped almost immediately. The authors used the German Microcensus (2005-8) to look at individuals in date-of-birth cohorts born between 1974 to 1977 in the GDR. These date-of-birth cohorts were still in education at reunification. The outcome variable of interest was whether or not they were employed at the time of the Microcensus (2005-08). The study relies on the following institutional feature: in the GDR, children turning six on or before May 31 of a given year were automatically enrolled in the first grade by September 1 of the same year. Children born on or after June 1 waited another year before starting school. The authors define their control group as consisting of individuals born on or after June 1. The treatment group consisted of individuals born on or before May 31. Treated individuals within each date-of-birth cohort were exposed to a year more socialist teaching (and one year less of Western education) than individuals in the control group. The authors find that, comparing males in the treatment and control groups, the proportion of males who were employed was 0.021 lower in the treatment than the control group with a standard error of 0.004. Comparing females in the treatment and control groups, the proportion of females who were employed was 0.009 lower in the treatment than the control group with a standard error of 0.019.

- Conduct suitable hypothesis tests to determine whether the difference in employment rates between the treatment and control groups for men and women respectively were statistically significant. What do you conclude?
- The authors interpret their results (subject to the results of the statistical tests above) as causal effects. Using the potential outcomes framework and the standard notation write down an expression describing this causal effect.
- What are the principal assumptions which would justify the authors’ claim that this effect is causal? Why might these assumptions not hold?
- Unfortunately the German Microcensus only records the current residence of respondents and not their residence at the time when they were at school. The authors therefore assumed that respondents currently residing in the East also received their education in the East. Why might this not be the case? If this is not the case how would it bias the results of the study?

You are interested in the effect of taking extra online classes (e.g. edX or Khan Academy courses) on whether teenagers apply to university. In a sample of 1000 young adults you observe whether or not they applied to university and whether or not they took at least one online class. You also observe whether they got at least five A*-C grades at GCSE. The tables show the numbers in the sample of the various types of young adult and the proportion who applied to university.

Numbers of young adults by GCSE grades and whether took at least one online course No. Online Courses 0 ≥1 GSCE Grades < 5 A*-C 150 100 ≥ 5 A*-C 250 500 Proportion applying to University No. Online Courses 0 ≥1 GSCE Grades < 5 A*-C 0.3 0.8 ≥ 5 A*-C 0.3 0.5 - Calculate the average difference in university application rates between those who did at least one online course and those who did not.
- Does this difference have a causal interpretation? Why or why not?
- A friend who has done a statistics course suggests that controlling for GCSE grades might help in this context. What does this mean and why might it help?
- Calculate the conditional treatment effects for those with less than 5 A*-C grades at GCSE and for those with at least 5 A*-C at GCSE.
- Calculate the Average Treatment Effect and the Average Effect of Treatment on the Treated and compare them to the difference you found in part (a).