# Prelims Probability and Statistics

Lady Margaret Hall

## What is this?

This document details our college teaching arrangements for prelims probability and statistics at Lady Margaret Hall. It will be updated throughout the academic year, so make sure to check back regularly. The date of the current version is given above.

## Overview

I’ll teach Prelims Probability & Statistics at LMH in seven classes over the course of Michaelmas and Hilary Terms and two revision classes in Trinity Term. Because these are classes, all first-year PPE students will attend the same sessions. Each class will begin with a quiz, but there is no work to submit in advance. I expect you to attend all classes and to arrive on time. If you are unwell and unable to attend, please let me know by email.

## Dates, Times and Locations

**Michaelmas Term:**Thursdays of Weeks 3, 5, and 7 from 3-5:30 in the Amanda Foreman Room**Hilary Term:**Thursday afternoons of Weeks 2, 4, 6, and 8. Time and location TBC.**Trinity Term:**Two revision classes, time and location TBC.

## Read This First

Before reading further, log onto canvas and read the Overview and Instructions for prelims probability and statistics to you understand how the course is structured.

## Collections

You will have two 90-minute collections (mock exams) for prelims probability and statistics: one at the beginning of HT covering probability, and another at the beginning of TT covering all of the course material. I will provide further details in class.

## Quizzes

Each class will begin with a short, closed-book, closed-notes quiz that I will mark and return to you at the next class. In-class quizzes will *always* include a random sample from the **Review Questions** that I assigned for that class. (You can find these under Class Details below.) Because you are given these questions in advance, you should be able to answer them correctly on the quiz. If you cannot, this means that you have not come to class adequately prepared. Quizzes may also include additional questions based on the material covered in past classes. These questions will be harder and you will not be given them in advance. They are designed to help you practice for collections and the real exam.

## Before Each Class

- Watch the lecture videos listed below under Class Details.
- Take notes
- Note down key definitions and formulas.
- Make flashcards and memorize them using Anki.

- Solve the
**Review Questions**listed below under Class Details. - Repeat as needed:
- If you have trouble with the review questions, this means that you need to go back to the lecture videos and slides before attempting them again.
- Working with friends is highly recommended!
- Make a note of anything that you find confusing and bring it with you to class.

## After Each Class

Complete the **Problem Set Questions** listed below under Class Details. Then watch the associated Demonstration Lecture for solutions and explanations. Your problem set questions are neither collected nor marked, but they will serve as inspiration for future quiz and collections problems.

## Class Material

As explained in more detail above: watch the videos and solve the review questions *before* class; complete the problem set and watch the associated demo lecture *after* class.

### Class #1 - MT Week 3

#### Videos

- P0: Introduction
- P1: Probability Basics
- P2: Conditional Probability
- P3: Independence & Bayes’ Rule

#### Review Questions

- What is a probability?
- State each of the three axioms of probability, aka the Kolmogorov Axioms.
- Suppose we carry out a random experiment that consists of flipping a fair coin twice.
- List all the basic outcomes in the sample space \(\Omega\).
- Let \(A\) be the event that you get at least one head. List all the basic outcomes in \(A\).
- List all the basic outcomes in \(A^c\).
- What is the probability of \(A\)? What is the probability of \(A^c\)?

- State the complement rule.
- Mark each statement as TRUE or FALSE.
- If \(A \subseteq B\) then \(P(A) \geq P(B)\).
- For any events \(A\) and \(B\), \(P(A \cap B) = P(A)P(B)\)
- For any events \(A\) and \(B\), \(P(A \cup B) = P(A) + P(B) − P(A \cap B)\)

- Let A and B be two arbitrary events.
- Show that \(P(A \cup B) \leq P(A) + P(B)\). (This is called
*Boole’s Inequality*.) - Show that \(P(A \cap B) \geq P(A) + P(B) − 1\). (This is called
*Bonferroni’s Inequality*)

- Show that \(P(A \cup B) \leq P(A) + P(B)\). (This is called
- State the definition of a conditional probability.
- Derive:
- Bayes’ Rule from the definition of conditional probability.
- The Multiplication rule for independent events from the definition of conditional probability.

- Name the various components of Bayes’ Rule.
- Suppose that \(P(B) = 0.4\), \(P(A|B) = 0.1\) and \(P(A|B^c) = 0.9\).
- Calculate \(P(A)\).
- Calculate \(P(B|A)\).

- Let \(A\) and \(B\) be two mutually exclusive events both with positive probability. Are they independent? Explain.
- Alexis the meteorologist determines that the probability of rain on Saturday is 50%, and the probability of rain on Sunday is also 50%. Sally the presenter sees Alexis’ forecast and summarizes it as follows: “According to Alexis we’re in for a wet weekend. There’s a 100% chance of rain this weekend: 50% on Saturday and 50% on Sunday.” Is Sally correct? Why or why not?
- When is it true that \(P(A|B) = P(B|A)\)? Explain.

#### Problem Set Questions

*Solutions to these questions are provided in Demonstration Lecture #1*.

- Prove, using the probability axioms
**The Complement Rule**: \(P(A^c) = 1 − P(A)\).**The Probability of the Union of Two Events Rule**: \(P(A\cup B) = P (A) + P (B) − P (A\cap B)\)**The Bounds on Probabilities Rule**: \(P (A \cup B) \leq P (A) + P (B)\)**The Logical Consequence Rule**: If \(B\) logically entails \(A\) then \(P (B) \leq P (A)\)

- Consider the experiment of tossing a coin repeatedly and counting the number of coin tosses required until the first head appears.
- Write down the sample space.
- Let \(A\) be the event that the number of coin tosses required for the first head to appear is even. Let \(B\) be the event that the number of coin tosses required until the first head appears is less than 5. Describe events \(A\) and \(B\) as sets.

- Consider the experiment of tossing a coin 2 times. Consider the events: \[
\begin{align*}
A &= \text{There is at most one Head}\\
B &= \text{There is at least one Head and there is at least one Tail}
\end{align*}
\]
- Are events \(A\) and \(B\) independent?
- What if you toss the coin 3 times?
- What if you toss the coin 4 times?

- Derive Bayes’ Rule from the definition of a conditional probability.
- Three percent of Tropicana brand oranges are already rotten when they arrive at the supermarket. In contrast, six percent of Sunkist brand oranges arrive rotten. A local supermarket buys forty percent of its oranges from Tropicana and the rest from Sunkist. Suppose we randomly choose an orange from the supermarket and see that it is rotten. What is the probability that it is a Tropicana?
- Imagine there are two universities, University A and University B, who take different approaches to generating research findings. In University A, a team of well-informed experts develop a theory. Their theories are correct 90% of the time. Before publishing a theory, the scientists at University A do an experimental test of the theory to check whether it is correct (e.g. a clinical trial). The test is designed so that 90% of correct theories will pass the test and only 10% of false theories pass the test.
- Let \(T\) be the event that the theory is true and \(\text{Pub}\) be the event that the theory passes the test and is published. Draw a Venn diagram to represent these probabilities.
- Calculate the probability that a theory from University A is published, i.e \(P(\text{Pub})\).
- Calculate the probability that a published theory from University A is correct, i.e \(P(T |\text{Pub})\).
- In University B, a team of creative experts think of theories that would be rather surprising and interesting if true. These theories are correct only 5% of the time. Again, before publishing their theory, the scientists at University B do that same experimental test as in University A of the theory, i.e. the test is designed so that 90% of correct theories will pass the test and only 10% of false theories pass the test. Calculate the probability that a published theory from University B is correct.
- The Research Council governing the publication of research requires that published theories (i.e. those that pass the test) must be replicated before they can be used in practice. Assume that the replication test is like the first test, i.e. the test is designed so that 90% of correct theories will pass the test and only 10% of false theories pass the test. Compute the replicability rate of University A and University B (i.e. the pass rate of the second test)

### Class #2 - MT Week 5

#### Videos

- D0: Data & Variables Intro
- D1: Data Basics
- D2: Probability Mass Functions
- D3: PDFs
- D4: CDFs
- D5: Standardization
- D6: Multivariate Distributions

#### Review Questions

*Note: your lecturer uses the term “variable” where the more standard term is “random variable.” I’ll always say random variable, and abbreviate it RV.*

- Define the “support” aka “support set” of a RV. What is the probability that a RV takes on a value outside of its support set?
- What is the difference between a discrete and continuous RV?
- What is a probability mass function (PMF)? What key properties does it satisfy?
- A random variable is said to follow a “Rademacher distribution” if it is equally likely to take any value in the set \(\{−1, +1\}\) and never takes on any value outside this set. Write out and sketch its probability mass function.
- Define the term cumulative distribution function (CDF).
- How is the CDF of a discrete RV related to its PMF?
- How is the CDF of a continuous RV related to its PDF?

- Let \(X\) be a RV with support set \(\{−1, +1\}\) and \(p(−1) = 1/3\). Write down the CDF of \(X\).
- Define the term parameter as it relates to a distribution. Are parameters constant or random?
- If \(X\) is a continuous RV and \(a, b\) are constants, how do we calculate \(P(a \leq X \leq b)\)?
- What are the key properties of a probability density function (PDF)?
- True or False: if \(f(x)\) denotes a PDF, then \(f(x)\) is a probability so \(0 \leq f(x) \leq 1\).
- Let \(X\) be a continuous RV with CDF \(F\) . Express \(P(−2 \leq X \leq 4)\) in terms of \(F\).
- Let X be a continuous RV with CDF \(F\). Express \(P(X \geq x)\) in terms of \(F\).
- Suppose that \(X\) is a continuous RV with support set \([−1, +1]\).
- Is 2 a possible realization of this variable?
- What is \(P(X = 0.5)\)?

- True or False: If \(X\) is a continuous RV then \(P(X \leq 0.3) = P(X < 0.3)\). Explain.
- What does it mean to “standardize” a RV?
- Let \(X\) be a \(\text{Uniform}(0, 1)\) RV. Write down the CDF of \(X\).
- In a \(N(\mu, \sigma^2)\) distribution, what features of the the distribution do the parameters \(\mu\) and \(\sigma^2\) control?
- Suppose that \(X \sim N(\mu, \sigma^2)\) Approximately what are the values of the following probabilities?
- \(P(\mu − \sigma \leq X \leq \mu + \sigma)\)
- \(P(\mu − 2\sigma \leq X \leq \mu + 2\sigma)\)
- \(P(\mu − 3\sigma \leq X \leq \mu + 3\sigma)\)

- Let \(X \sim N(\mu = −2,\sigma^2 = 25)\). Without consulting a table, what is the approximate value of \(P(−12 \leq X \leq 8)\)?
- Let \(X \sim N(0, 1)\). Calculate the following:
- \(P(X = 0)\)
- \(P(X ≤ −1)\)
- \(P(−1.2 \leq X \leq 1.2)\)
- the value of \(t\) such that \(P(X \leq t) = 0.5\)
- the value of \(t\) such that \(P(X > t) = 0.80\)
- the value of \(t > 0\) such that \(P(−t \leq X \leq t) = 0.9\)
- the value of \(t > 0\) such that \(P(−t \leq X \leq t) = 0.82\)

- Let \(X \sim N (\mu = −2, \sigma^2 9)\). Calculate the following:
- \(P(X = 0)\)
- \(P(X \leq −1)\)
- \(P(−1.2 \leq X \leq 1.2)\)
- the value of \(t\) such that \(P(X \leq t) = 0.5\)
- the value of \(t\) such that \(P(X > t) = 0.80\)
- the values of \(t_0\) and \(t_1\) (symmetric around the mean) such that \(P(t_0 \leq X \leq t_1) = 0.9\).
- the values of \(t_0\) and \(t_1\) (symmetric around the mean) such that \(P(t_0 \leq X \leq t_1) = 0.82\)

- Suppose that \(U \sim N (0, \sigma^2)\) and let \(a, b, x\) be constants. Find the distribution of each of the following:
- \(Y = a + U\)
- \(Y = a + bU\)
- \(Y = a + bx + U\)

- Let \(X \sim N(\mu = 1, \sigma^2 = 2)\).
- What is the 1st quartile of \(X\)?
- What is the 77th percentile of \(X\)?
- What is the median of \(X\)?

- Define the following terms:
- conditional distribution
- marginal distribution

- Consider the following bivariate PMF \(p(X, Y)\) \[
\begin{array}{l|cccc}
& X = 0 & X=2 & X = 3 & X = 4\\
\hline
Y=0& \boxed{?} & 2/24 & 2/24 & 1/24\\
Y=1& 1/24 & 3/24 & 4/24 & 2/24\\
Y=2& 1/24 & 3/24 & 2/24 & 2/24\\
\end{array}
\]
- Fill in the missing value: \(\boxed{?}\).
- Write out the marginal PMF of \(X\).
- Write out the marginal CDF of \(Y\).
- Write out the conditional PMF of \(Y\) given \(X=3\).
- Write out the conditional CDF of \(X\) given \(Y=2\).
- Are \(X\) and \(Y\) independent? Explain.

#### Problem Set Questions

- Consider a random variable \(X\) which is uniformly distributed \(U(0, 1)\).
- Draw the probability density function (pdf), \(f(x)\). What is the formula for the pdf?
- Draw the cumulative density function (cdf), \(F(x)\). What is the formula for the cdf?

- Consider a random variable \(X\) which is uniformly distributed \(U(0, 1)\).
- What is the distribution of \(Z = 2 + X\)?
- What is the distribution of \(Z = 3X\)?
- What is the distribution of \(Z = 2 + 3X\)?

- Consider a random variable X which is uniformly distributed \(U(−2, 3)\).
- What is the distribution of \(Z = 2 + X\)?
- What is the distribution of \(Z = 3X\)?
- What is the distribution of \(Z = 2 + 3X\)?

- For a random variable $X U (2, 12) (read the \(\sim\) as “that is distributed”)
- What is \(P(3 \leq X \leq 8)\)?
- What is \(P(X = 9)\)?

- Consider a random variable X which is normally distributed with mean 0 and variance of 1.
- What is the distribution of \(Z = 5 + X\)?
- What is the distribution of \(Z = 2X\)?
- What is the distribution of \(Z = 2 + 3X\)?

- Consider a random variable \(X\) which is normally distributed with mean 2 and variance of 9.

- What is the distribution of \(Z = −2 + X\)?
- What is the distribution of \(Z = X/\sqrt{9}\)?
- What is the distribution of \(Z = (−2+X)/\sqrt{9}\)

- If \(X \sim∼ N (\mu=−1, \sigma^2 =2)\) what is
- \(P(X = 0)\)?
- \(P(X \leq −1)\)?
- \(P(−1 \leq X \leq 1)\)?
- the value of \(t\) such that \(P(X \leq t) = 0.25\)
- the value of \(t\) such that \(P(X > t) = 0.80\)
- the value of \(t\) such that \(P(−|t| \leq X \leq |t|) = 0.9\)

- Suppose that students’ marks on the economics prelims paper are normally distributed with mean 61 and standard deviation 9.5.
- What is the probability that a student scores (i) less than 50? (ii) 70 or more?
- What score is exceeded by only 10% of students?
- Find the median, and the upper and lower quartiles.
- What proportion of students have scores within 5 marks of the mean?

- Consider the following bivariate probability mass function \(f(X, Y )\) \[
\begin{array}{l|ccc}
& X=0 & X=2 & X=3 \\
\hline
Y=0&1/36 & 1/18 & 1/12\\
Y=1& 1/9 & 2/9 & 1/3\\
Y=2& 1/36 &1/18 & 1/12
\end{array}
\] what is
- the marginal PMF of \(X\)?
- the marginal CDF of \(Y\)?
- the conditional PMF of \(Y\): \(f(Y |X = 3)\)?
- the conditional CDF of \(X\): \(F(X|Y = 2)\)?
- Are \(X\) and \(Y\) independent?

### Class #3 - MT Week 7

#### Videos

- M0: Intermission
- M1: Expected Values
- M2: Variance
- M3: Conditional Expectations
- M4: Correlation, Covariance, & Independence

#### Review Questions

- Consider the population of Oxford undergraduates. Let \(X\) be a random variable that represents the distribution of height in centimeters in this population, and \(Y\) be a random variable that represents the distribution of weight in kilograms. What are the units of the following quantities?
- \(\mathbb{E}(X)\)
- \(\text{Var}(Y)\)

- The standard deviation of \(X\)
- \(Z \equiv \left[X - \mathbb{E}(X)\right] / \sqrt{\text{Var}(X)}\)
- \(\text{Cov}(X,Y)\)
- \(\text{Cor}(X,Y)\)

- Give an example of each:
- A convex function
- A concave function

- Let \(X\) be a random variable with support set \(\left\{0, 1, 2 \right\}\), \(p(1) = 0.3\), and \(p(2) = 0.5\). Calculate \(\mathbb{E}[X]\).
- Suppose that \(X \sim N(\mu, \sigma^2)\).
- What is \(\mathbb{E}[X]\)?
- What is \(\text{Var}(X)\)?

- Let \(X\) be a discrete random variable that is equally likely to take on the values \(2\) and \(4\) and never takes on any other values. Prove or disprove the following claims, either by using a property of expected value that you have learned, or by direct calculation:
- \(E(X+1)= E(X) + 10\)
- \(E(X/10) = E(X)/10\)
- \(E(10/X)=10/E(X)\)
- \(E(X^2) = [E(X)]^2\)
- \(E(5X + 2)/10 = [5E(X) + 2]/10\)

- State Jensen’s Inequality.
- Give the formula for each of these quantities in terms of expectations:
- Variance
- Covariance
- Correlation

- \(X\) be a binary variable where \(p = \mathbb{P}(X=1)\) is the proportion of ``ones’’ in the population. Write down the expressions for the following quantities in terms of \(p\):
- \(\mathbb{E}(X)\)
- \(\text{Var}(X)\)

- Let \(X\) be a binary variable where \(p = \mathbb{P}(X=1)\) is the proportion of “ones” in the population.
- Show that \(\mathbb{E}(X) = p\)
- Show that \(\mathbb{E}(X^2) = p\)
- Combine the preceding to parts to show that \(\text{Var}(X) = p(1 - p)\).

- \(X\) be a discrete random variable.
- Write down the definition of the expected value \(\mathbb{E}[X]\) of \(X\).
- Is \(\mathbb{E}[X]\) constant or random? Explain why in one sentence.

- What is a conditional expectation?
- Formally state the Law of Iterated Expectations.
- TRUE or FALSE: the following statements are equivalent
- \(X\) and \(Y\) are independent.
- The covariance between \(X\) and \(Y\) is zero.

- Suppose that \(\text{Var}(Y) = 0\) and \(\text{Var}(X) > 0\). Calculate \(\text{Cov}(X,Y)\).
- Suppose \(X\) is a random variable with support \(\{-1, 0, 1\}\) where \(p(-1)=q\) and \(p(1) = p\). What relationship must hold between \(p\) and \(q\) to ensure that \(E[X] = 0\)?
- Suppose that \(\mathbb{E}[X]=8\) and \(Y= 3 + X/2\). Calculate \(\mathbb{E}[Y]\).
- Suppose that \(X\) is a discrete random variable and \(g\) is a function. Explain how to calculate \(\mathbb{E}[g(X)]\). Is this the same thing as \(g\left(\mathbb{E}[X]\right)\)?
- Let \(X\) be a random variable with support set \(\left\{ -1, 1 \right\}\) and \(p(-1) = 1/3\). Calculate \(E[X^2]\).
- Suppose that \(X \sim N(\mu = -2, \sigma^2 = 3)\). What is \(\text{Cov}(X,X)\)?
- Let \(X\) and \(Y\) random variable with \(\mathbb{E}[X] = 2\) and \(\mathbb{E}[Y] = 1\). Calculate \(\mathbb{E}[X - Y]\).
- Let \(W = Y + Z\) and suppose that \(\text{Cov}(X,Y) = 0\) and \(\text{Cov}(X,Z) = 0\). What is the correlation between \(W\) and \(X\)?
- Suppose \(\mathbb{E}[X] = 2\) and \(\text{Var}(X) = 5\). Calculate \(\mathbb{E}[X^2]\).
- Let \(X\) and \(Y\) be random variable with \(\text{Var}(X) = 2\) , \(\text{Var}(Y) = 1\), and \(\text{Cov}(X,Y) = 0\). Calculate \(\text{Var}(X - Y)\).
- Let \(X\) and \(Y\) be two random variable with \(\text{Var}(X) = \sigma_X^2\), \(\text{Var}(Y) = \sigma_Y^2\), and \(\text{Cov}(X,Y) = \sigma_{XY}\). If \(a,b,c\) are constants, what is \(\text{Var}(cX + bY + a)\)?
- Suppose that \(X\) and \(Y\) are two random variables with correlation \(\rho = 0.3\), and standard deviations \(\sigma_X = 4\) and \(\sigma_Y = 5\).
- Calculate \(\text{Cov}(X,Y)\).
- Let \(Z = (X + Y)/2\). Calculate \(\text{Var}(Z)\).

- Mark each statement as TRUE or FALSE. If FALSE, explain.
- The expected value of a sum \(\mathbb{E}[X + Y]\) does in general equal the sum of the expected values \(\mathbb{E}[X] + \mathbb{E}[Y]\); this only holds when \(X\) and \(Y\) are independent.
- The variance of a sum \(Var(X + Y)\) is always equal to the sum of the variances \(Var(X) + Var(Y)\).

- Let \(X\) be a Uniform\((0,1)\) random variable.
- Calculate \(\mathbb{E}(X)\).
- Calculate \(\mathbb{E}(X^2)\).
- Combine the preceding two parts to find \(\text{Var}(X)\).

- Suppose that \(X \sim N(\mu, \sigma^2)\). Approximately what are the following probabilities? (a) \(\mathbb{P}(\mu - \sigma \leq X \leq \mu + \sigma)\) (b) \(\mathbb{P}(\mu - 2\sigma \leq X \leq \mu + 2\sigma)\) (c) \(\mathbb{P}(\mu - 3\sigma \leq X \leq \mu + 3\sigma)\)
- Suppose that \(X \sim N(0,1)\). Calculate \(\mathbb{E}[X^2]\).
- Let \(X\) and \(U\) be random variables with \(\mathbb{E}(U|X)=0\) and \(\mathbb{E}(X) = 5\). Let \(a\) and \(b\) be constants, and further define \(Y = a + bX + U\).
- Show that \(\mathbb{E}(U) = 0\).
- Calculate \(\mathbb{E}(Y)\).
- Calculate \(\mathbb{E}(Y|X = 0)\).
- Suppose that \(\mathbb{E}(U) = 0\) but \(\mathbb{E}(U|X)\) is unknown. Would this make any difference to your answers in parts (b) and (c) above?

#### Problem Set Questions

- Let \(X\) and \(Y\) be random variables and \(a,b,c\) be constants. Use the linearity of expectation to derive the following results:
- \(\text{Cov}(X,Y) = \mathbb{E}(XY) - \mathbb{E}(X)\mathbb{E}(Y)\)
- \(\text{Var}(a + bX + cY) = b^2 \text{Var}(X) + c^2 \text{Var}(Y) + 2bc \text{Cov}(X,Y)\)

- Let \(X\) and \(Y\) be random variables such that \(\mathbb{E}[Y] \neq 0\). Show that, provided that all of the relevant expectations exist and are finite, \[\frac{\mathbb{E}[X]}{\mathbb{E}[Y]} - \mathbb{E}\left[ \frac{X}{Y}\right] = \frac{\text{Cov}(X/Y,\, Y)}{\mathbb{E}[Y]}.\]
- Let \(X\) be a random variable and \(a, b\) be constants. Suppose that \(Y = a + b X\). Calculate the correlation between \(X\) and \(Y\) in each of the following cases:
- \(b > 0\)
- \(b < 0\)
- \(b = 0\)

- Let \(X\) and \(U\) be random variables and \(a, b\) be constants. Suppose that \(Y = a + b X + U\) where \(\mathbb{E}(U) = 0\) and \(\text{Cov}(X, U) = 0\).
- Calculate \(\mathbb{E}(Y)\).
- Calculate \(\text{Var}(Y)\).
- Show that \(\text{Cov}(X, Y) = b \text{Var}(X)\)
- Show that \(\text{Corr}(X,Y) = b \sqrt{\frac{\text{Var}(X)}{b^2 \text{Var}(X) + \text{Var}(U)}}\).

- Let \(X\) be a \(\text{Bernoulli}(p)\) random variable, i.e. \(X \in \{0, 1\}\) with \(\mathbb{P}(X=1) = p\). Show that, for any other random variable \(Y\) such that the relevant expectations exist and are finite \[ \frac{\text{Cov}(X,Y)}{\text{Var}(X)} = \mathbb{E}(Y|X=1) - \mathbb{E}(Y|X=0). \]