Optimal Tests
So far we’ve learnt various aspects about hypothesis tests. We’ve seen some examples and discussed ways of constructing tests from CIs and concepts like pvalues. Now we move on to the theoretical aspect.
Is there a “best” test among all the possible tests that we can have? To understand the notion of optimality, recall that we have constrained the type I error to be $\leq$ the level $\alpha$. Our goal was to minimize the type II error, or equivalently to maximize the power. The optimality of a test is just some constrained optimization.
Motivating example
Let’s consider the following example. Suppose we have a single sample $Y \sim Unif(0, \theta)$. The hypotheses are
$$ H_0: \theta = 1 \quad vs. \quad \theta > 1 $$
Compare the rejection rules
 $Y > 1  \alpha$, and
 $Y < \alpha$.
Intuitively, the first one is a better choice since it’s trying to reflect the hypotheses and reject $H_0$ when the sample is large. However, both rejection rules have level $\alpha$ since
$$ \begin{gather*} P_1(Y > 1  \alpha) = 1  (1  \alpha) = \alpha \\ P_1(Y < \alpha) = \alpha \end{gather*} $$
using the CDF of the uniform distribution. If we compare the power, since $\frac{Y}{\theta} \sim Unif(0, 1)$, for $\theta > 1$,
$$ \begin{gather*} pw_1(\theta) = P_\theta(Y > 1  \alpha) = P_\theta \left(\frac{Y}{\theta} > \frac{1\alpha}{\theta} \right) = 1  \frac{1  \alpha}{\theta} \\ pw_2(\theta) = P_\theta(Y < \alpha) = P_\theta \left(\frac{Y}{\theta} < \frac{\alpha}{\theta} \right) = \frac{\alpha}{\theta} \end{gather*} $$
So for all $\theta > 1$,
$$ pw_1(\theta)  pw_2(\theta) = 1  \frac{1  \alpha}{\theta}  \frac{\alpha}{\theta} = 1  \frac{1}{\theta} > 0 $$
Hence the first rejection rule is preferred. If we think about the problem in general, the challenges are
 The power $pw(\theta)$ is a function of $\theta \in \Theta_\alpha$. There are different $\theta$ values involved.
 Optimization is with respect to all possible decision rules.
NeymanPearson lemma
If we eliminate the first challenge above and assume that both hypotheses are simple, then the following theorem can be applied to this simplest scenario.
Assumptions: We have hypotheses $H_0$: $\theta = \theta_0$ vs. $H_a: \theta = \theta_a$. The significance level is $\alpha$, and $L(\theta)$ is the likelihood function where $\theta \in \{\theta_0, \theta_a\}$.
Conclusion: The test (decision rule) which maximizes $pw(\theta_a)$ is given by
$$ T = \frac{L(\theta_0)}{L(\theta_a)}, \quad RR = \{t: t < k_\alpha\} $$
where $k_\alpha$ is chosen to make^{1} $P_{\theta_0} (T < k_\alpha) = \alpha$. In other words, we reject $H_0$ when the likelihood at $\theta_0$ is relatively small compared with the likelihood at $\theta_a$.
Proof
Here we only consider the continuous case. Suppose sample $Y = \{Y_1, \cdots, Y_n\} \in \mathbb{R}^n$. Any rejection rule can be formulated using a region $D \subset \mathbb{R}^n$ where we reject $H_0$ if $\{Y_1, \cdots, Y_n\} \in D$. For a general test statistic $T = t(Y_1, \cdots, Y_n)$, we have
$$ D = \{(y_1, \cdots, y_n): t(y_1, \cdots, y_n) \in RR\} $$
Suppose $D_1, D_2 \subset \mathbb{R}^n$ and
 Test 1 is $D_1$, the test given in the theorem.
 Test 2 is $D_2$, an arbitrary test with level $\leq \alpha$.
Our goal is to show powers $pw_1(\theta_a) \geq pw_2(\theta_a)$. Let
$$ \begin{gathered} S_1 = \{y \in \mathbb{R}^n: 1_{D_1}(y) > 1_{D_2}(y)\} \\ S_1 = \{y \in \mathbb{R}^n: 1_{D_1}(y) < 1_{D_2}(y)\} \\ \end{gathered} $$
Namely, in $S_1$ test 1 rejects $H_0$ but test 2 doesn’t; in $S_2$ test 1 doesn’t reject $H_0$ but test 2 rejects $H_0$.
Let $L_0 = L(\theta_0)$ and $L_a = L(\theta_a)$. Note that
$$ \begin{aligned} &\quad \int_{\mathbb{R}^n} \left[ 1_{D_1}(y)  1_{D_2}(y) \right] \left[ k_\alpha L_a(y)  L_0(y) \right] dy \\ &= \int_{S_1} (1_{D_1}  1_{D_2})(k_\alpha L_a  L_0)dy + \int_{S_2} (1_{D_1}  1_{D_2})(k_\alpha L_a  L_0)dy \end{aligned} $$
By the rule of test 1, in $S_1$ we have $k_\alpha L_a > L_0$, which means the first term above is $\geq 0$. Similarly in $S_2$, we have $k_\alpha L_a \leq L_0$, so the second term is also $\geq 0$. By rearranging terms, we have
$$ \begin{equation} \label{eq:neymanpearson} k_\alpha \int_{\mathbb{R}n} (1{D_1}  1_{D_2})L_\alpha dy \geq \int_{\mathbb{R}n} (1{D_1}  1_{D_2})L_0 dy \end{equation} $$
Note that
$$ \int_{\mathbb{R}n} 1{D_i} L_0(y) dy = \int_{D_i}L_0(y)dy = P_{\theta_0}(Y \in D_i) $$
is the type I error for test $i$ where $i = 1, 2$. A similar relation holds if $L_0$ is replaced by $L_a$ which leads to powers. So Equation $\eqref{eq:neymanpearson}$ translates to
$$ k_\alpha \left(pw_1(\theta_a)  pw_2(\theta_a) \right) \geq \alpha  \alpha = 0 $$
Example
Consider $Y_1, \cdots, Y_n \overset{i.i.d.}{\sim} N(\theta, 1)$ and we want to test
$$ H_0: \theta = \theta_0 \quad vs. \quad H_a: \theta = \theta_a, \quad \theta_0 < \theta_a $$
The likelihood is given by
$$ L(\theta) = \frac{1}{(2\pi)^\frac{n}{2}} \exp \left \{ \frac{\sum_{i=1}^n (Y_i  \theta)^2}{2} \right\} $$
Therefore,
$$ \begin{aligned} T &= \frac{L(\theta_0)}{L(\theta_a)} = \exp \left\{ \frac{\sum_{i=1}^n (Y_i  \theta_0)^2  \sum_{i=1}^n (Y_i  \theta_a)^2}{2} \right\} \\ &= \exp \left\{ (\theta_0  \theta_a)\sum_{i=1}^n Y_i  \frac{n}{2}\left(\theta_0^2  \theta_a^2 \right) \right\} \end{aligned} $$
and we reject $H_0$ if $T < k_\alpha$ for some suitable $k_\alpha$. We may observe that $T$ is a decreasing function of $\bar{Y}_n$ since we assumed $\theta_0 < \theta_a$. Hence, we reject $H_0$ if $\bar{Y}_n > c_\alpha$ for some suitable $c_\alpha$.
To determine $c_\alpha$, set
$$ \begin{aligned} P_{\theta_0}(\bar{Y}n > c\alpha) &= P_{\theta_0} \left( \frac{1}{\sqrt{n}}\sum_{i=1}^n (Y_i  \theta_0) > \sqrt{n}(c_\alpha  \theta_0) \right) \\ &= 1  \Phi\left( \sqrt{n}(c_\alpha  \theta_0) \right) = \alpha \end{aligned} $$
which leads to
$$ \sqrt{n}(c_\alpha  \theta_0) = z_\alpha \Rightarrow c_\alpha = \frac{z_\alpha}{\sqrt{n}} + \theta_0 $$
since $\bar{Y}_n$ follows $N\left(\theta, \frac{1}{n} \right)$.
The rejection rule can be equivalently formulated using the loglikelihood, where we reject $H_0$ if
$$ \ln L(\theta_0)  \ln L(\theta_a) < u_\alpha = \ln k_\alpha $$
which often simplifies calculations.
Composite alternative
The theorem tells us about the most powerful test in the situation of simple $H_0$ vs. simple alternative $H_a$. In general, we will deal with composite $H_0$ and $H_a$. How do we formulate optimality in this more general setting?
Def: Suppose we are given a sample and want to test
$$ H_0: \theta \in \Theta_0 \quad vs. \quad H_a: \theta \in \Theta_a $$
We say a test is the uniformly most powerful
(UMP) one if for any $\theta \in \Theta_a$, the power $pw(\theta)$ is the largest among all possible choices of decision rules^{2}.
Corollary: In the case of a simple $H_0: \theta = \theta_0$ and a composite $H_a: \theta \in \Theta_a$, if the decision rule specified in the NeymanPearson lemma
$$ \text{reject } H_0 \text{ when } \frac{L(\theta_0)}{L(\theta)} < k_\alpha, \quad \theta \in \Theta_a $$
does not depend on $\theta$, then it yields a UMP test^{3}.
To understand this, consider the example above where we replace the alternative with $H_a: \theta > \theta_0$, and still use the likelihood ratio statistic. The derivation shows that no matter which $\theta_a > \theta_0$ we focus on, we get the same rejection rule:
$$ \bar{Y}_n > c_\alpha = \frac{z_\alpha}{\sqrt{n}} + \theta_0 $$
which doesn’t have $\theta_a$ involved, hence the test is UMP. This is actually how the NeymanPearson lemma typically gets applied. Unfortunately, the applicability of this corollary is very limited. Whenever the alternative is twosided, often the UMP does not exist.
If the alternative is $H_a: \theta \neq \theta_0$. We know the most powerful (MP) test for any $\theta > \theta_0$ is to reject $H_0$ when
$$ \bar{Y}_n > \frac{z_\alpha}{\sqrt{n}} + \theta_0 $$
Similarly we can derive the MP test for any $\theta < \theta_0$ is to reject $H_0$ when
$$ \bar{Y}_n < \frac{z_\alpha}{\sqrt{n}} + \theta_0 $$
The two rejection rules are mutually exclusive, so there’s no UMP test for both sides.
To have a welldefined optimality problem for twosided tests, one often introduces an additional constraint called unbiasedness^{4}, which says that $pw(\theta) \geq \alpha$ for all $\theta \in \Theta_a$, and then one can often obtain the UMP unbiased test. This topic is left for more advanced courses.

Here for simplicity, we assume any choice of $\alpha$ can be exactly achieved, which is the case if the distributions are continuous. The theorem also holds for the discrete case with some technical modifications. ↩︎

Here we have a hidden constraint, which is we need to ensure the level $\alpha$ is correct, i.e. the type I error cannot exceed $\alpha$. ↩︎

The UMP test is not always unique. For details, see Chapter 3 of Lehman, E. L. (2005). JP Romano Testing statistical Hypothesis. ↩︎

This is different from the meaning in unbiased estimators. ↩︎
Apr 21  Linear Models  9 min read 
Apr 18  Likelihood Ratio Test  7 min read 
Apr 09  pvalues  6 min read 
Apr 01  Statistical Decision  4 min read 
Apr 01  Statistical Test  13 min read 