The
purpose of the statistical methods that have been discussed so far
is to estimate a population parameter by means of a sample statistic.
Another class of statistical methods is used for testing hypotheses
about population parameters or for measuring the amount of evidence
against a hypothesis.
Consider the universe
of students in a college. Let the variable X be the number of pounds
by which a student's weight deviates from the ideal weight for a person
of the same sex, height, and build. You want to find out whether
the population of students is, on the average, underweight or overweight.
To this end, you have taken a random sample of X values from nine
students, with results as given in the following DATA step:
title 'Deviations from Normal Weight';
data x;
input X @@;
datalines;
-7 -2 1 3 6 10 15 21 30
;
You
can define several hypotheses of interest. One hypothesis is that,
on the average, the students are of exactly ideal weight. If μ
represents the population mean of the X values, then you can write
this hypothesis, called the
null hypothesis, as
. The other two hypotheses, called
alternative hypotheses, are that the students
are underweight on the average,
, and that the students are overweight on the average,
.
The null hypothesis
is so called because in many situations it corresponds to the assumption
of “no effect” or “no difference.” However,
this interpretation is not appropriate for all testing problems. The
null hypothesis is like a straw man that can be toppled by statistical
evidence. You decide between the alternative hypotheses according
to which way the straw man falls.
A naive way to approach
this problem would be to look at the sample mean
and decide among the three hypotheses according
to the following rule:
-
If
, then decide on
.
-
If
, then decide on
.
-
If
, then decide on
.
The trouble with this
approach is that there might be a high probability of making an incorrect
decision. If H
0 is true, then you are nearly
certain to make a wrong decision because the chances of
being exactly zero are almost nil. If μ is
slightly less than zero, so that H
1 is true,
then there might be nearly a 50% chance that
will be greater than zero in repeated sampling,
so the chances of incorrectly choosing H
2
would also be nearly 50%. Thus, you have a high probability of making
an error if
is near zero. In such cases, there is not enough
evidence to make a confident decision, so the best response might
be to reserve judgment until you can obtain more evidence.
The question is, how
far from zero must
be for you to be able to make a confident decision?
The answer can be obtained by considering the sampling distribution
of
. If X has an approximately normal distribution,
then
has an approximately normal sampling distribution.
The mean of the sampling distribution of
is μ. Assume temporarily that σ, the
standard deviation of X, is known to be 12. Then the standard error
of
for samples of nine observations is
.
You know that about
95% of the values from a normal distribution are within two standard
deviations of the mean, so about 95% of the possible samples of nine
X values have a sample mean
between
and
, or between −8 and 8. Consider the chances
of making an error with the following decision rule:
-
If
, then decide on
.
-
If
, then reserve judgment.
-
If
, then decide on
.
If H
0 is true, then in about 95% of the possible samples
will be between the
critical
values and 8, so you will reserve judgment. In these cases
the statistical evidence is not strong enough to fell the straw man.
In the other 5% of the samples you will make an error; in 2.5% of
the samples you will incorrectly choose H
1,
and in 2.5% you will incorrectly choose H
2.
The price that you pay
for controlling the chances of making an error is the necessity of
reserving judgment when there is not sufficient statistical evidence
to reject the null hypothesis.