Construction and Interpretation of P-P Plots

A P-P plot compares the empirical cumulative distribution function (ecdf) of a variable with a specified theoretical cumulative distribution function $F(\cdot )$. The ecdf, denoted by $F_ n(x)$, is defined as the proportion of nonmissing observations less than or equal to x, so that $F_ n (x_{(i)}) = \frac{i}{n}$.

To construct a P-P plot, the n nonmissing values are first sorted in increasing order:

$ x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)} $

Then the ith ordered value $x_{(i)}$ is represented on the plot by the point whose x-coordinate is $F(x_{(i)})$ and whose y-coordinate is $\frac{i}{n}$.

Like Q-Q plots and probability plots, P-P plots can be used to determine how well a theoretical distribution models a data distribution. If the theoretical cdf reasonably models the ecdf in all respects, including location and scale, the point pattern on the P-P plot is linear through the origin and has unit slope.

Note: See Interpreting P-P Plots in the SAS/QC Sample Library.

Unlike Q-Q and probability plots, P-P plots are not invariant to changes in location and scale. For example, the data in the section Getting Started: PPPLOT Statement are reasonably described by a normal distribution with mean 10 and standard deviation 0.3. It is instructive to display these data on normal P-P plots with a different mean and standard deviation, as created by the following statements:

data Sheets;
   input Distance @@;
   label Distance='Hole Distance in cm';
   datalines;
 9.80 10.20 10.27  9.70  9.76
10.11 10.24 10.20 10.24  9.63
 9.99  9.78 10.10 10.21 10.00
 9.96  9.79 10.08  9.79 10.06
10.10  9.95  9.84 10.11  9.93
10.56 10.47  9.42 10.44 10.16
10.11 10.36  9.94  9.77  9.36
 9.89  9.62 10.05  9.72  9.82
 9.99 10.16 10.58 10.70  9.54
10.31 10.07 10.33  9.98 10.15
;
proc capability data=Sheets noprint;
   ppplot Distance / normal(mu=9.5 sigma=0.3) square;
   ppplot Distance / normal(mu=10 sigma=0.5) square;
run;

The ODS GRAPHICS ON statement specified before the PROC CAPABILITY statement enables ODS Graphics, so the P-P plots are created using ODS Graphics instead of traditional graphics. The resulting plots are shown in Figure 5.32 and Figure 5.33.

Figure 5.32: Normal P-P Plot with Mean Specified Incorrectly


Figure 5.33: Normal P-P Plot with Standard Deviation Specified Incorrectly


Specifying a mean of 9.5 instead of 10 results in the plot shown in Figure 5.32, while specifying a standard deviation of 0.5 instead of 0.3 results in the plot shown in Figure 5.33. Both plots clearly reveal the model misspecification.