Input Data Sets :: SAS/QC(R) 13.1 User's Guide

Input Data Sets

Subsections:

DATA= Data Set
HISTORY= Data Set
LOADINGS= Data Set
TABLE= Data Set

The MVPMONITOR procedure accepts a single primary input data set of one of three types.

A DATA= data set contains new process data to be analyzed by using an existing PCA model (Phase II analysis).
A HISTORY= data set contains process data and the accompanying scores, residuals, and statistics produced by applying a PCA model. The process data can be the original data that was used to create the model (Phase I analysis) or subsequent data that was analyzed by using a previously created model (Phase II analysis).
A TABLE= data set contains a summary of score charts, SPE charts, or $T^2$ charts, which consists of the statistics, control limits, and other information.

These options are mutually exclusive. If you do not specify an option identifying a primary input data set, PROC MVPMONITOR uses the most recently created SAS data set as a DATA= data set.

When you specify a DATA= data set, you must also specify a LOADINGS= data set that contains loadings and other information describing the PCA model. When you specify a HISTORY= data set, you must also specify a LOADINGS= data set if you specify the CONTRIBUTIONS option in a TSQUARECHART statement.

DATA= Data Set

A DATA= data set provides the process measurement data for a Phase II analysis. In addition to the process variables, a DATA= data set can include the following:

BY variables
ID variables
a SERIES variable
a TIME variable

When you specify a DATA= data set, you must also specify a LOADINGS= data set that contains the loadings for the principal component model that describes the variation of the process. These loadings are used to score the new data from the DATA= data set. The process variables in the LOADINGS= data set must have the same names as those in the DATA= data set.

HISTORY= Data Set

A HISTORY= data set provides the input data set for a Phase I or Phase II analysis. In addition to the original process variables, it contains principal component scores, residuals, SPE and $T^2$ statistics, and a count of the observations that are used to construct the principal component model, as summarized in Table 13.5.

Table 13.5: Variables in the HISTORY= Data Set

Variable	Description
Prin1–Prinj	Principal component scores
R_ $var1$ –R_ $varp$	Residuals
_NOBS_	Number of observations used to build the principal component model
_SPE_	Squared prediction error (SPE)
_TSQUARE_	$T^2$ statistic computed from principal component scores

A HISTORY= data set must include variables that contain principal component scores. The score variables names must consist of a common prefix followed by the numbers 1, 2, …, j, where j is the number of principal components. By default, the common prefix is Prin. You can use the PREFIX= option to specify another prefix for score variables.

If the number of principal components is less than the total number of process variables, the HISTORY= data set should also contain residual variables. A residual variable name consists of a common prefix followed by the corresponding process variable name. The default residual variable prefix is R_. For example, if the process variables are A, B, and C, the default residual variable names are R_A, R_B, and R_C. You can use the RPREFIX= option to specify a different residual variable prefix.

Note: Usually you create a HISTORY= data set by specifying the PROC MVPMODEL OUT= option or the PROC MVPMONITOR OUTHISTORY= option. If the PREFIX= or RPREFIX= option is used when such an output data set is created, you must specify the same prefixes to identify the score and residual variables when you read it as a HISTORY= data set.

LOADINGS= Data Set

The LOADINGS= data set contains the following information about the principal component model:

eigenvalues of the correlation or covariance matrix used to construct the model
principal component loadings
process variable means used to center the variable values
process variable standard deviations used to scale the variable values

You can produce a LOADINGS= data set by using the PROC MVPMODEL OUTLOADINGS= option. Table 13.6 lists the variables that are required in a LOADINGS= data set.

Table 13.6: Variables in the LOADINGS= Data Set

Variable	Description
_VALUE_	The value contained in process variables for a given observation
_NOBS_	Number of observations used to build the principal component model
_PC_	Principal component number; 0 for the observation that contains eigenvalues
process variables	Values associated with the process variables

Valid values for the _VALUE_ variable are as follows:

EIGEN: eigenvalues from the principal component analysis
LOADING: principal component loadings
MEAN: process variable means
STD: process variable standard deviations

The LOADINGS= data set contains one EIGEN observation and j LOADING observations, where j is the number of principal components in the model. The presence of a MEAN observation indicates that the process variables were centered when the principal component model was constructed, and the presence of a STD observation indicates that the process variables were scaled when the principal component model was constructed. The means and standard deviations are used to center and scale new data in a Phase II analysis.

TABLE= Data Set

A TABLE= data set contains a summary of one or more score charts, SPE charts, or $T^2$ control charts. Usually, you create a TABLE= data set by specifying the OUTTABLE= option in a SCORECHART, SPECHART, or TSQUARECHART statement. Each type of TABLE= data set contains different variables, and when you specify a TABLE= data set you can only specify chart statements of the corresponding type. For example, if you use a TABLE= data set that contains SPE chart summary data, you cannot specify a SCORECHART or TSQUARECHART statement.

You can use a TABLE= data set to display previously created control charts or to specify custom control limits by computing your own _LCL_ and _UCL_ values.

Table 13.7, Table 13.8, and Table 13.9 list the variables that are contained in the three types of TABLE= data set.

Note:

SPE chart and $T^2$ chart TABLE= data sets contain one observation per time value. Score chart TABLE= data sets contain one observation for each principal component per time value.
SPE chart and $T^2$ chart TABLE= data sets contain residual variables corresponding to the process variables. Each residual variable has the same name as the corresponding process variable

Table 13.7: Score Chart TABLE= Data Set Variables

Variable	Description
`_COMP_`	Principal component number
`_EXLIM_`	Flag that indicates control limit was exceeded
`_LCL_`	Lower control limit
`_MEAN_`	Center line
`_SCORE_`	Principal component score
series	Optional SERIES variable
`_SIGMAS_`	Multiple of score standard deviation used to compute control limits
time	Optional TIME variable
`_UCL_`	Upper control limit

Table 13.8: SPE Chart TABLE= Data Set Variables

Variable	Description
`_ALPHA_`	Probability ( $\alpha$ ) of exceeding control limits
`_EXLIM_`	Flag to indicate control limit was exceeded
`_LCL_`	Lower control limit
`_MEDIAN_`	Center line
residuals	Residual variables
series	Optional SERIES variable
`_SPE_`	Squared prediction error (SPE) statistic
time	Optional TIME variable
`_UCL_`	Upper control limit

Table 13.9: $T^2$ Chart TABLE= Data Set Variables

Variable	Description
`_ALPHA_`	Probability ( $\alpha$ ) of exceeding control limits
`_EXLIM_`	Flag to indicate control limit was exceeded
`_LCL_`	Lower control limit
`_MEDIAN_`	Center line
residuals	Residual variables
series	Optional SERIES variable
time	Optional TIME variable
`_TSQUARE_`	$T^2$ statistic (TSQUARECHART statement only)
`_UCL_`	Upper control limit

The MVPMONITOR Procedure

Input Data Sets

DATA= Data Set

HISTORY= Data Set

LOADINGS= Data Set

TABLE= Data Set