Jan 27, 2020
6 min read
Mar 11, 2022 15:59 UTC
Introducing consistency, a concept about the convergence of estimators. We start from the convergence of non-random number sequences to convergence in probability, then to consistency of estimators and its properties.


Consistency is about the convergence of estimators. Recall what convergence means for non-random numbers. Suppose $x_1, x_2, x_3, \cdots$ are non-random numbers. What is the meaning of $\lim\limits_{n \rightarrow \infty} x_n = x$?

For example, if $x_n = \frac{1}{n}$, $\lim\limits_{n \rightarrow \infty} x_n = 0$. If $x_n = (-1)^n$, $\lim\limits_{n \rightarrow \infty} x_n$ doesn’t exist.

Def: A sequence $x_n \in \mathbb{R}$ is said to converge to $x$, denoted as $\lim\limits_{n \rightarrow \infty} x_n = x$ if for any fixed number $\epsilon > 0$, we have $|x_n - x| \leq \epsilon$ for all sufficiently large $n$. “For all sufficiently large $n$” means that there exists $N$ such that for all $n \geq N$.


Suppose $Y_i \overset{i.i.d.}{\sim} N(\mu, \sigma^2)$. $\bar{Y}$ converges to $\mu$. We claim that $\bar{Y}_n \sim N(\mu, \frac{\sigma^2}{n})$. See this theorem for details.

No matter how large $n$ is, $\bar{Y}_n$ has a positive probability to exceed any fixed threshold (think about the bell-shaped curve). The good news is that

$$ \underbrace{MSE(\bar{Y}_n) = Var(\bar{Y}_n)}_{\text{unbiasedness}} = \frac{\sigma^2}{n} \rightarrow 0 \text{ as } n \rightarrow \infty. $$

Convergence in probability

Def: A sequence of random variables $X_n$ is said to converge in probability to a constant $x$ if for any fixed $\epsilon > 0$, $$ P\left( |X_n - x| \leq \epsilon \right) \rightarrow 1 \text{ as } n \rightarrow \infty. $$

This is the same as

$$ P(x - \epsilon \leq X_n \leq x + \epsilon) \rightarrow 1, $$


$$ P\left( |X_n - X| > \epsilon \right) \rightarrow 0 \text{ as } n \rightarrow \infty. $$

The above (converge in probability to) can be denoted as

$$ X_n \xrightarrow{P} x $$

The concept also applies to the case where the limit is also random. For us, the limit is always non-random in this course.


Def: An estimator $\hat\theta_n$ is said to be consistent if $\hat\theta_n \xrightarrow{P} \theta$ no matter which true $\theta \in \Theta$ is. Here $n$ denotes the sample size.

Normal distribution example

Suppose $Y_i \overset{i.i.d.}{\sim}N(\mu, \sigma^2)$. We know that $\bar{Y} \sim N(\mu, \frac{\sigma^2}{n})$. Show the consistency of $\bar{Y}$.

$$ P(|\bar{Y}_n - \mu| \leq \epsilon) = P\left( \left| \frac{\bar{Y}_n - \mu}{\sigma / \sqrt{n}} \right| \leq \frac{\epsilon}{\sigma / \sqrt{n}} \right) $$

The above is the standardization of a random variable. A fact here is that $Z = \frac{\bar{Y}_n - \mu}{\sigma / \sqrt{n}} \sim N(0, 1)$ because $E[\bar{Y}_n] = \mu$ and $s.e.(\bar{Y}_n) = \sigma / \sqrt{n}$.

$$ P(|\bar{Y}_n - \mu| \leq \epsilon) = P\left( |Z| \leq \frac{\epsilon}{\sigma/\sqrt{n}} \right) = P\left( |Z| \leq \frac{\epsilon}{\sigma}\sqrt{n} \right) $$

which is the area under $\left( -\frac{\epsilon}{\sigma}\sqrt{n}, \frac{\epsilon}{\sigma}\sqrt{n} \right)$ in the PDF of $N(0, 1)$. As $n \rightarrow \infty$, the boundaries get pushed further outside and eventually we get the area to be $1$.

Uniform distribution example

$Y_i \overset{i.i.d.}{\sim} Unif(0, \theta)$. $\hat\theta_n = \max(Y_1, \cdots, Y_n)$. We want to show $\hat\theta_n$ is consistent.

We know that $\hat\theta_n \in [0, \theta]$. We can assume that $\epsilon \in (0, \theta)$ since $|\hat\theta_n - \theta| > \theta$ is impossible.

$$ \begin{aligned} P(|\hat\theta_n - \theta| > \epsilon) &= P(\theta - \hat\theta_n > \epsilon) \\ &= P(\hat\theta_n < \theta - \epsilon) \\ &= P(Y_1 < \theta - \epsilon, Y_2 < \theta - \epsilon, \cdots, Y_n < \theta - \epsilon) \\ &= P(Y_1 < \theta - \epsilon)^n \\ F_{Y_1}(y) &= \frac{y}{\theta}, 0 \leq y \leq 1 \\ P(|\hat\theta_n - \theta| > \epsilon) &= \left( \frac{\theta - \epsilon}{\theta} \right)^n \rightarrow 0 \text{ as } n \rightarrow \infty. \end{aligned} $$


If $MSE(\hat\theta_n; \theta) \rightarrow 0$ as $n \rightarrow \infty$ $\forall \theta \in \Theta$, then $\hat\theta_n$ is consistent.

Lemma (Markov inequality): If random variable $X \geq 0$, then for any constant $k > 0$, we have $$ P(X > k) \leq \frac{1}{k}E[X]. $$

Assume $X$ is continuous with PDF $f(\cdot)$. The case for a discrete $X$ is similar.

$$ \begin{aligned} E[X] &= \int_0^\infty xf(x)dx \\ &\geq \int_k^\infty xf(x)dx \\ &\geq \int_k^\infty kf(x)dx \quad\cdots\text{ because } x \geq k, \\ &= k \int_k^\infty f(x)dx \\ &= kP(X > k) \\ \frac{1}{k}E[X] &\geq P(X > k) \end{aligned} $$

Now we move on to the proof of the theorem. Fix $\epsilon > 0$. Note that

$$ \begin{aligned} P(|\hat\theta_n - \theta| > \epsilon) &= P\left( (\hat\theta_n - \theta)^2 > \epsilon^2 \right) \\ &\leq \frac{1}{\epsilon^2} \underbrace{E\left[ (\hat\theta_n - \theta)^2 \right]}_{MSE \rightarrow 0} \rightarrow 0 \end{aligned} $$

Example using the MSE theorem

$Y_i \overset{i.i.d.}{\sim} Unif(0, \theta)$. $\hat\theta_n = \max(Y_1, \cdots, Y_n)$. We want to show $\hat\theta_n$ is consistent.

We have the same setup as the uniform distribution example, but this time we want to apply the theorem. Recall that

$$ MSE(\hat\theta_n) = \frac{2\theta^2}{(n+1)(n+2)} \rightarrow 0 $$

as $n \rightarrow \infty$.

Law of large numbers

The LLN states that if $Y_i$ are i.i.d. with $E[Y_i] = \mu$ and $Var(Y_i) = \sigma^2 < \infty$, then $\bar{Y}_n \xrightarrow{P} \mu$.

Proof: $MSE(\bar{Y}_n) = Var(\bar{Y}_n) = \frac{\sigma^2}{n} \rightarrow 0$ as $n \rightarrow \infty$.

The assumption $Var(Y_i) < \infty$ can be relaxed by assuming only $E[|X_i|] < \infty$.

Properties of convergence in probability

Suppose random variables $X_n \xrightarrow{P} x$ and $Y_n \xrightarrow{P} y$.

  • $X_n + Y_n \xrightarrow{P} x + y$.
  • $X_n Y_n \xrightarrow{P} xy$.
  • $X_n / Y_n \xrightarrow{P} x / y$ if $Y_n$ and $y \neq 0$.
  • If $g$ is a continuous function, $g(X_n) \xrightarrow{P} g(x)$.

The proofs can be found in more advanced courses. An example of applying them would be $X_n \xrightarrow{P} x$, $Y_n \xrightarrow{P} y$ and $Z_n \xrightarrow{P} z$. We then have

$$ (X_n + Y_n)e^{Z_n} \xrightarrow{P} (x+y)e^z. $$

Think of this as $X_n’ = X_n + Y_n$ and $Y_n’ = e^{Z_n}$.


Suppose $Y_i$ are i.i.d. with $E[Y_i] = \mu$, $Var(Y_i) = \sigma^2 < \infty$ (and $E[Y_i^4] < \infty$). Show that

$$ \hat\sigma_n^2 = \frac{1}{n-1}\sum_{i=1}^n(Y_i - \bar{Y}_n)^2 $$

is unbiased for $\sigma^2$.

The goal is to show $\hat\sigma_n^2$ is consistent for $\sigma^2$. There are several approaches. The straightforward method is to compute $MSE = Var$. This is nasty!

The second (easier) approach is to write

$$ \begin{aligned} \hat\sigma_n^2 &= \frac{1}{n-1}\left( \sum_{i=1}^n Y_i^2 - n\bar{Y}n^2 \right) \\ &= \frac{n}{n-1}\left( \frac{1}{n}\sum{i=1}^n Y_i^2 - \bar{Y}_n^2 \right) \ \end{aligned} $$

As $n \rightarrow \infty$, $\frac{n}{n-1} \rightarrow 1$, $\bar{Y}_n^2 \rightarrow \mu^2$ (LLN), and if we think of $Y_i^2$ as $X_i$,

$$ \frac{1}{n}\sum_{i=1}^n Y_i^2 \xrightarrow{LLN} E[Y_i^2] = \mu^2 + \sigma^2 $$

So we have $$ \hat\sigma_n^2 \rightarrow 1 \times (\mu^2 + \sigma^2 - \mu^2) = \sigma^2 $$

A note of caution is that the concept of consistency only tells you the convergence eventually. It doesn’t tell us how fast it’s happening. If it’s really slow, then it still might not be a good estimator.

Related Posts