Review Questions

Click below to expand the review questions that you should complete before each class. These will form the basis of our in-class quizzes in the weeks indicated below. More information on quizzes is available here for more details of the quizzes. The review questions are relatively straightforward. If you have trouble solving them, this suggests you may need to spend more time on the lecture videos and slides before making another attempt.

Class 1 - MT Week 3

What is a probability?
State each of the three axioms of probability, aka the Kolmogorov Axioms.
Suppose we carry out a random experiment that consists of flipping a fair coin twice.
1. List all the basic outcomes in the sample space \(\Omega\).
2. Let \(A\) be the event that you get at least one head. List all the basic outcomes in \(A\).
3. List all the basic outcomes in \(A^c\).
4. What is the probability of \(A\)? What is the probability of \(A^c\)?
State the complement rule.
Mark each statement as TRUE or FALSE.
1. If \(A \subseteq B\) then \(P(A) \geq P(B)\).
2. For any events \(A\) and \(B\), \(P(A \cap B) = P(A)P(B)\)
3. For any events \(A\) and \(B\), \(P(A \cup B) = P(A) + P(B) − P(A \cap B)\)
Let A and B be two arbitrary events.
1. Show that \(P(A \cup B) \leq P(A) + P(B)\). (This is called Boole’s Inequality.)
2. Show that \(P(A \cap B) \geq P(A) + P(B) − 1\). (This is called Bonferroni’s Inequality)
State the definition of a conditional probability.
Derive:
1. Bayes’ Rule from the definition of conditional probability.
2. The Multiplication rule for independent events from the definition of conditional probability.
Name the various components of Bayes’ Rule.
Suppose that \(P(B) = 0.4\), \(P(A|B) = 0.1\) and \(P(A|B^c) = 0.9\).
1. Calculate \(P(A)\).
2. Calculate \(P(B|A)\).
Let \(A\) and \(B\) be two mutually exclusive events both with positive probability. Are they independent? Explain.
Alexis the meteorologist determines that the probability of rain on Saturday is 50%, and the probability of rain on Sunday is also 50%. Sally the presenter sees Alexis’ forecast and summarizes it as follows: “According to Alexis we’re in for a wet weekend. There’s a 100% chance of rain this weekend: 50% on Saturday and 50% on Sunday.” Is Sally correct? Why or why not?
When is it true that \(P(A|B) = P(B|A)\)? Explain.

Class 2 - MT Week 5

Note: your lecturer uses the term “variable” where the more standard term is “random variable.” I’ll always say random variable, and abbreviate it RV.

Define the “support” aka “support set” of a RV. What is the probability that a RV takes on a value outside of its support set?
What is the difference between a discrete and continuous RV?
What is a probability mass function (PMF)? What key properties does it satisfy?
A random variable is said to follow a “Rademacher distribution” if it is equally likely to take any value in the set \(\{−1, +1\}\) and never takes on any value outside this set. Write out and sketch its probability mass function.
Define the term cumulative distribution function (CDF).
1. How is the CDF of a discrete RV related to its PMF?
2. How is the CDF of a continuous RV related to its PDF?
Let \(X\) be a RV with support set \(\{−1, +1\}\) and \(p(−1) = 1/3\). Write down the CDF of \(X\).
Define the term parameter as it relates to a distribution. Are parameters constant or random?
If \(X\) is a continuous RV and \(a, b\) are constants, how do we calculate \(P(a \leq X \leq b)\)?
What are the key properties of a probability density function (PDF)?
True or False: if \(f(x)\) denotes a PDF, then \(f(x)\) is a probability so \(0 \leq f(x) \leq 1\).
Let \(X\) be a continuous RV with CDF \(F\) . Express \(P(−2 \leq X \leq 4)\) in terms of \(F\).
Let X be a continuous RV with CDF \(F\). Express \(P(X \geq x)\) in terms of \(F\).
Suppose that \(X\) is a continuous RV with support set \([−1, +1]\).
1. Is 2 a possible realization of this variable?
2. What is \(P(X = 0.5)\)?
True or False: If \(X\) is a continuous RV then \(P(X \leq 0.3) = P(X < 0.3)\). Explain.
What does it mean to “standardize” a RV?
Let \(X\) be a \(\text{Uniform}(0, 1)\) RV. Write down the CDF of \(X\).
In a \(N(\mu, \sigma^2)\) distribution, what features of the the distribution do the parameters \(\mu\) and \(\sigma^2\) control?
Suppose that \(X \sim N(\mu, \sigma^2)\) Approximately what are the values of the following probabilities?
1. \(P(\mu − \sigma \leq X \leq \mu + \sigma)\)
2. \(P(\mu − 2\sigma \leq X \leq \mu + 2\sigma)\)
3. \(P(\mu − 3\sigma \leq X \leq \mu + 3\sigma)\)
Let \(X \sim N(\mu = −2,\sigma^2 = 25)\). Without consulting a table, what is the approximate value of \(P(−12 \leq X \leq 8)\)?
Let \(X \sim N(0, 1)\). Calculate the following:
1. \(P(X = 0)\)
2. \(P(X ≤ −1)\)
3. \(P(−1.2 \leq X \leq 1.2)\)
4. the value of \(t\) such that \(P(X \leq t) = 0.5\)
5. the value of \(t\) such that \(P(X > t) = 0.80\)
6. the value of \(t > 0\) such that \(P(−t \leq X \leq t) = 0.9\)
7. the value of \(t > 0\) such that \(P(−t \leq X \leq t) = 0.82\)
Let \(X \sim N (\mu = −2, \sigma^2 9)\). Calculate the following:
1. \(P(X = 0)\)
2. \(P(X \leq −1)\)
3. \(P(−1.2 \leq X \leq 1.2)\)
4. the value of \(t\) such that \(P(X \leq t) = 0.5\)
5. the value of \(t\) such that \(P(X > t) = 0.80\)
6. the values of \(t_0\) and \(t_1\) (symmetric around the mean) such that \(P(t_0 \leq X \leq t_1) = 0.9\).
7. the values of \(t_0\) and \(t_1\) (symmetric around the mean) such that \(P(t_0 \leq X \leq t_1) = 0.82\)
Suppose that \(U \sim N (0, \sigma^2)\) and let \(a, b, x\) be constants. Find the distribution of each of the following:
1. \(Y = a + U\)
2. \(Y = a + bU\)
3. \(Y = a + bx + U\)
Let \(X \sim N(\mu = 1, \sigma^2 = 2)\).
1. What is the 1st quartile of \(X\)?
2. What is the 77th percentile of \(X\)?
3. What is the median of \(X\)?
Define the following terms:
1. conditional distribution
2. marginal distribution
Consider the following bivariate PMF \(p(X, Y)\) \[ \begin{array}{l|cccc} & X = 0 & X=2 & X = 3 & X = 4\\ \hline Y=0& \boxed{?} & 2/24 & 2/24 & 1/24\\ Y=1& 1/24 & 3/24 & 4/24 & 2/24\\ Y=2& 1/24 & 3/24 & 2/24 & 2/24\\ \end{array} \]
1. Fill in the missing value: \(\boxed{?}\).
2. Write out the marginal PMF of \(X\).
3. Write out the marginal CDF of \(Y\).
4. Write out the conditional PMF of \(Y\) given \(X=3\).
5. Write out the conditional CDF of \(X\) given \(Y=2\).
6. Are \(X\) and \(Y\) independent? Explain.

Class 3 - MT Week 7

Consider the population of Oxford undergraduates. Let \(X\) be a random variable that represents the distribution of height in centimeters in this population, and \(Y\) be a random variable that represents the distribution of weight in kilograms. What are the units of the following quantities?
1. \(\mathbb{E}(X)\)
2. \(\text{Var}(Y)\)
3. The standard deviation of \(X\)
4. \(Z \equiv \left[X - \mathbb{E}(X)\right] / \sqrt{\text{Var}(X)}\)
5. \(\text{Cov}(X,Y)\)
6. \(\text{Cor}(X,Y)\)
Give an example of each:
1. A convex function
2. A concave function
Let \(X\) be a random variable with support set \(\left\{0, 1, 2 \right\}\), \(p(1) = 0.3\), and \(p(2) = 0.5\). Calculate \(\mathbb{E}[X]\).
Suppose that \(X \sim N(\mu, \sigma^2)\).
1. What is \(\mathbb{E}[X]\)?
2. What is \(\text{Var}(X)\)?
Let \(X\) be a discrete random variable that is equally likely to take on the values \(2\) and \(4\) and never takes on any other values. Prove or disprove the following claims, either by using a property of expected value that you have learned, or by direct calculation:
1. \(\mathbb{E}(X+10)= \mathbb{E}(X) + 10\)
2. \(\mathbb{E}(X/10) = \mathbb{E}(X)/10\)
3. \(\mathbb{E}(10/X)=10/\mathbb{E}(X)\)
4. \(\mathbb{E}(X^2) = [\mathbb{E}(X)]^2\)
5. \(\mathbb{E}(5X + 2)/10 = [5\mathbb{E}(X) + 2]/10\)
State Jensen’s Inequality.
Give the formula for each of these quantities in terms of expectations:
1. Variance
2. Covariance
3. Correlation
\(X\) be a binary variable where \(p = \mathbb{P}(X=1)\) is the proportion of ``ones’’ in the population. Write down the expressions for the following quantities in terms of \(p\):
1. \(\mathbb{E}(X)\)
2. \(\text{Var}(X)\)
Let \(X\) be a binary variable where \(p = \mathbb{P}(X=1)\) is the proportion of “ones” in the population.
1. Show that \(\mathbb{E}(X) = p\)
2. Show that \(\mathbb{E}(X^2) = p\)
3. Combine the preceding to parts to show that \(\text{Var}(X) = p(1 - p)\).
\(X\) be a discrete random variable.
1. Write down the definition of the expected value \(\mathbb{E}[X]\) of \(X\).
2. Is \(\mathbb{E}[X]\) constant or random? Explain why in one sentence.
What is a conditional expectation?
Formally state the Law of Iterated Expectations.
TRUE or FALSE: the following statements are equivalent
1. \(X\) and \(Y\) are independent.
2. The covariance between \(X\) and \(Y\) is zero.
Suppose that \(\text{Var}(Y) = 0\) and \(\text{Var}(X) > 0\). Calculate \(\text{Cov}(X,Y)\).
Suppose \(X\) is a random variable with support \(\{-1, 0, 1\}\) where \(p(-1)=q\) and \(p(1) = p\). What relationship must hold between \(p\) and \(q\) to ensure that \(E[X] = 0\)?
Suppose that \(\mathbb{E}[X]=8\) and \(Y= 3 + X/2\). Calculate \(\mathbb{E}[Y]\).
Suppose that \(X\) is a discrete random variable and \(g\) is a function. Explain how to calculate \(\mathbb{E}[g(X)]\). Is this the same thing as \(g\left(\mathbb{E}[X]\right)\)?
Let \(X\) be a random variable with support set \(\left\{ -1, 1 \right\}\) and \(p(-1) = 1/3\). Calculate \(\mathbb{E}[X^2]\).
Suppose that \(X \sim N(\mu = -2, \sigma^2 = 3)\). What is \(\text{Cov}(X,X)\)?
Let \(X\) and \(Y\) random variable with \(\mathbb{E}[X] = 2\) and \(\mathbb{E}[Y] = 1\). Calculate \(\mathbb{E}[X - Y]\).
Let \(W = Y + Z\) and suppose that \(\text{Cov}(X,Y) = 0\) and \(\text{Cov}(X,Z) = 0\). What is the correlation between \(W\) and \(X\)?
Suppose \(\mathbb{E}[X] = 2\) and \(\text{Var}(X) = 5\). Calculate \(\mathbb{E}[X^2]\).
Let \(X\) and \(Y\) be random variable with \(\text{Var}(X) = 2\) , \(\text{Var}(Y) = 1\), and \(\text{Cov}(X,Y) = 0\). Calculate \(\text{Var}(X - Y)\).
Let \(X\) and \(Y\) be two random variable with \(\text{Var}(X) = \sigma_X^2\), \(\text{Var}(Y) = \sigma_Y^2\), and \(\text{Cov}(X,Y) = \sigma_{XY}\). If \(a,b,c\) are constants, what is \(\text{Var}(cX + bY + a)\)?
Suppose that \(X\) and \(Y\) are two random variables with correlation \(\rho = 0.3\), and standard deviations \(\sigma_X = 4\) and \(\sigma_Y = 5\).
1. Calculate \(\text{Cov}(X,Y)\).
2. Let \(Z = (X + Y)/2\). Calculate \(\text{Var}(Z)\).
Mark each statement as TRUE or FALSE. If FALSE, explain.
1. The expected value of a sum \(\mathbb{E}[X + Y]\) does in general equal the sum of the expected values \(\mathbb{E}[X] + \mathbb{E}[Y]\); this only holds when \(X\) and \(Y\) are independent.
2. The variance of a sum \(Var(X + Y)\) is always equal to the sum of the variances \(Var(X) + Var(Y)\).
Let \(X\) be a Uniform\((0,1)\) random variable.
1. Calculate \(\mathbb{E}(X)\).
2. Calculate \(\mathbb{E}(X^2)\).
3. Combine the preceding two parts to find \(\text{Var}(X)\).
Suppose that \(X \sim N(\mu, \sigma^2)\). Approximately what are the following probabilities? (a) \(\mathbb{P}(\mu - \sigma \leq X \leq \mu + \sigma)\) (b) \(\mathbb{P}(\mu - 2\sigma \leq X \leq \mu + 2\sigma)\) (c) \(\mathbb{P}(\mu - 3\sigma \leq X \leq \mu + 3\sigma)\)
Suppose that \(X \sim N(0,1)\). Calculate \(\mathbb{E}[X^2]\).
Let \(X\) and \(U\) be random variables with \(\mathbb{E}(U|X)=0\) and \(\mathbb{E}(X) = 5\). Let \(a\) and \(b\) be constants, and further define \(Y = a + bX + U\).
1. Show that \(\mathbb{E}(U) = 0\).
2. Calculate \(\mathbb{E}(Y)\).
3. Calculate \(\mathbb{E}(Y|X = 0)\).
4. Suppose that \(\mathbb{E}(U) = 0\) but \(\mathbb{E}(U|X)\) is unknown. Would this make any difference to your answers in parts (b) and (c) above?

Class 4 - HT Week 1

Define the following terms and give a simple example: “population”, “sample”, “sample size”.
Explain the distinction between a parameter and a statistic.
Give examples of a (i) parameter which is also a moment (ii) a parameter which is not a moment.
Briefly compare and contrast sampling and non-sampling error.
Define a simple random sample. Does it help us to address sampling error, non-sampling error, both, or neither?
What is
1. a “random variable”?
2. a “statistic”?
What does it mean for a statistic to have a sampling distribution?
Define the following terms: “standard deviation”, “standard error”.
Define the terms “estimand”, “estimator”, “estimate”, using the mean as an example.
What does it mean for a set or sequence of random variables to be “iid”?
If sampling is performed without replacement why are the resulting random variables not iid? Under what circumstances does it become, for all practical purposes, irrelevant as regards the iid property whether one draws with or without replacement?
What does it mean for an estimator to be
1. unbiased
2. consistent
Can an estimator be consistent yet biased? If so give an example, if not why not?
Why do we divide by the sample size minus one when we estimate the sample variance?
Suppose the population mean μ is known and we want to estimate the population variance \(\sigma^2\). Is \(\widehat{\sigma}^2 \equiv \frac{1}{n-1} \sum_{i=1}^n (X_i - \mu)^2\) an unbiased estimator of \(\sigma^2\)? Justify your answer.
Define the concept of efficiency of an estimator. What does it mean to say that one estimator is more efficient than another?
State (formally)
1. the central limit theorem;
2. the law of large numbers.
TRUE or FALSE: the Central Limit Theorem says that large populations are approximately normally distributed. If FALSE, correct the statement.
Give the formal definition of a confidence interval and the “rough intuition” that we can use to interpret it.
How does the width of a confidence interval for a population mean vary with
1. the sample size?
2. the confidence level?
3. the sample variance?
4. the sample mean?
5. the standard error of the sample mean?
Write out the formula for the 95% confidence interval for
1. a population mean,
2. the difference between the means of two independent populations,
3. the difference between the population means of a pair of variables from a single population.
Explain briefly how your answers to the preceding question would chage if the random variables in question are binary.
How many standard errors to either side of the mean do you need to go in order to construct a
1. 90% confidence interval,
2. 95% confidence interval,
3. 99% confidence interval?
Using the appropriate statistical tables, how many standard errors to either side of the mean do you need to go in order to construct an 80% confidence interval?

Class 5 - HT Week 3

What is a “Null Hypothesis”?
Define Type I and Type 2 errors.
If you set a significance level of \(\alpha = 0.05\) what proportion of the time would you expect to falsely reject the Null hypothesis?
If the significance level of a test is 0.15, and you decide to reject the Null, at what level of confidence are you doing so? What are you being confident about?
Given a test statistic with a Standard Normal distribution, what is the “rule of thumb” critical value for rejecting the Null at a 5% significance level? What is the “rule of thumb” rejecting it at a 10% significance level?
For a normally distributed test statistic, use the appropriate statistical tables to find the critical value for a (two-sided) test with each of the following significance levels:
1. 6%
2. 17%
Write out the formula for the test statistic for a test of
1. whether a population mean is equal to a certain value.
2. whether the difference in means in two independent populations is equal to a certain value.
3. whether the difference between the population means of a pair of variables from a single population are the same.
Repeat the preceding problem for the case for the case of binary random variables.

Class 6 - HT Week 5

What is the key difference between time series data and cross-section data?
List, and be prepared to describe, the typical features exhibited by time series data.
Define formally (write down the formula) for the population autocorrelation between \(X_t\) and \(X_{t-h}\).
Sketch the typical autocorrelation functions for
1. a time series with no autocorrelation,
2. a quarterly times series with annual seasonality,
3. a strongly persistent time series.
Define “weak stationarity”.
Why is stationarity important?
If a time series is not stationary, how might you be able to transform it so that it is? Give three examples.
Define formally (write down the formulae) for each of the following:
1. an AR(1) process
2. a Random Walk with Drift
3. a Random Walk
4. a White Noise process
Is a Random Walk a stationary process? If so why? If not why not?
Is it possible for an AR(1) process to be stationary?
Define formally (write down the formula) for \(E [X_t|X_{t-1}]\) for a mean-reverting AR(1) process.

Class 7 - HT Week 7

Define formally (write down the formulae)
1. the average treatment effect (ATE);
2. the average effect treatment on the treated (ATT).
Define formally (write down the formula) the decomposition of the average difference between the treated and untreated groups into the ATT plus selection bias.
How (in terms of the effect on the relevant conditional expectations) does randomisation of treatment solve the selection bias problem?
What is meant by the terms
1. “Internal validity”
2. “External Validity”
List the main threats to the internal validity of a study of treatment response.
List the main threats to the external validity of a study of treatment response.
What are likely to be the empirical effect of non-compliance on the calculation of a causal effect. Will it bias the result up or down?
What is the conditional independence assumption?
What are the three main problems encountered when using the conditional independence assumption.
Is the conditional independence assumption verifiable?