A two-sided test is a test of a hypothesis with a two-sided alternative. Two-sided tests include simple symmetric tests and more complicated asymmetric tests that might have distinct lower and upper alternative references.
For a symmetric two-sided test with the null hypothesis against the alternative , an equivalent null hypothesis is with a two-sided alternative , where . A fixed-sample test rejects if , where is a sample estimate of and is the critical value.
A common two-sided test is the test for the response difference between a treatment group and a control group. The null and alternative hypotheses are and , respectively, where is the response difference between the two groups. If a greater value indicates a beneficial effect, then there are three possible results:
The test rejects the hypothesis of equality and indicates that the treatment is significantly better if the standardized statistic .
The test rejects the hypothesis and indicates the treatment is significantly worse if the standardized statistic .
The test indicates no significant difference between the two responses if .
The p-value of the test is if and if . The hypothesis is rejected if the p-value of the test is less than —that is, if or . A symmetric confidence interval for has lower and upper limits
|
which is
|
The hypothesis is rejected if the confidence interval for the parameter does not contain zero. That is, the lower limit is greater than zero or the upper limit is less than zero.
With an alternative reference , a Type II error probability is defined as
|
which is
|
Thus
|
The resulting power is the probability of correctly rejecting the null hypothesis, which includes the probability for the lower alternative and the probability for the upper alternative. The SEQDESIGN procedure uses only the probability of correctly rejecting the null hypothesis for the correct alternative in the power computation.
Thus, under the upper alternative hypothesis, the power in the SEQDESIGN procedure is computed as the probability of rejecting the null hypothesis for the upper alternative, , and a very small probability of rejecting the null hypothesis for the lower alternative, , is ignored. This power computation is more rational than the power based on the probability of correctly rejecting the null hypothesis (Whitehead, 1997, p. 75).
That is,
|
Then with ,
|
The drift parameter can be derived for specified and , and the maximum information is given by
|
If the maximum information is available, then the required sample size can be derived. For example, in a one-sample test for mean, if the standard deviation is known, the sample size n required for the test is
|
On the other hand, if the alternative reference , standard deviation , and sample size n are all known, then can be derived with a given and, similarly, can be derived with a given .
For a generalized two-sided test with the null hypothesis against the alternative , an equivalent null hypothesis is with a two-sided alternative , where . A fixed-sample test rejects if the standardized statistic or , where the critical values and .
With the lower alternative reference , a lower Type II error probability is defined as
|
This implies
|
and the power is the probability of correctly rejecting the null hypothesis for the lower alternative,
|
The lower drift parameter is derived as
|
Then, with specified and , if the maximum information is known, the lower alternative reference can be derived. If the maximum information is unknown, then with the specified lower alternative reference , the maximum information required is
|
Similarly, the upper drift parameter is derived as
|
For a given , , and the upper alternative reference , the maximum information required is
|
Thus, the maximum information required for the design is given by
|
Note that with the maximum information level , if , then the derived power from the lower alternative is larger than the specified . Similarly, if , then the derived power from the upper alternative is larger than the specified .
If maximum information is available, the required sample size can be derived. For example, in a one-sample test for mean, if the standard deviation is known, the sample size n required for the test is .
On the other hand, if the alternative references, Type I error probabilities and , standard deviation , and sample size n are all specified, then the Type II error probabilities and and the corresponding powers can be derived.