MM Estimation :: SAS/STAT(R) 12.1 User's Guide

MM Estimation

Algorithm
Convergence Criteria
Bias Test
Asymptotic Covariance and Confidence Intervals
R Square and Deviance
Linear Tests
Model Selection

MM estimation is a combination of high-breakdown-value estimation and efficient estimation, which was introduced by Yohai (1987). It has the following three steps:

Compute an initial (consistent) high-breakdown-value estimate ${\hat\btheta }’$ . The ROBUSTREG procedure provides two kinds of estimates as the initial estimate: the LTS estimate and the S estimate. By default, the LTS estimate is used because of its speed and high breakdown value. The breakdown value of the final MM estimate is decided by the breakdown value of the initial LTS estimate and the constant in the $\chi$ function. To use the S estimate as the initial estimate, you specify the INITEST=S option in the PROC ROBUSTREG statement. In this case, the breakdown value of the final MM estimate is decided only by the constant . Instead of computing the LTS estimate or the S estimate as the initial estimate, you can also specify the initial estimate explicitly by using the INEST= option in the PROC ROBUSTREG statement. See the section INEST= Data Set for details.

Find ${\hat\sigma }’$ such that

${1\over n-p} \sum _{i=1}^ n \chi \left( {y_ i - \mb {x}_ i’ {\hat\btheta }’ \over {\hat\sigma }’ }\right) = \beta$

where $\beta = \int \chi (s) d\Phi (s)$ .

The ROBUSTREG procedure provides two choices for $\chi$ : Tukey’s bisquare function and Yohai’s optimal function.

Tukey’s bisquare function, which you can specify with the option CHIF=TUKEY, is

$\chi _{k_0}(s) = \left\{ \begin{array}{ll} 3({s\over k_0})^2 - 3({s\over k_0})^4 + ({s\over k_0})^6& {\mbox{if }} |s| \leq k_0 \\ 1 & {\mbox{ otherwise }} \end{array} \right.$

where can be specified with the K0= option. The default is 2.9366 such that the asymptotically consistent scale estimate ${\hat\sigma }’$ has the breakdown value of 25%.

Yohai’s optimal function, which you can specify with the option CHIF=YOHAI, is

$\chi _{k_0}(s) = \left\{ \begin{array}{lll} {s^2 \over 2} & {\mbox{if }} |s| \leq 2 k_0 \\ k_0^2 [ b_0 + b_1({s\over k_0})^2 + b_2({s\over k_0})^4 & \\ + b_3({s\over k_0})^6 + b_4({s\over k_0})^8] & {\mbox{if }} 2 k_0 < |s| \leq 3 k_0\\ 3.25 k_0^2 & {\mbox{if }} |s| >3 k_0 \end{array} \right.$

where , , , , and . You can specify with the K0= option. The default is 0.7405 such that the asymptotically consistent scale estimate ${\hat\sigma }’$ has the breakdown value of 25%.

Find a local minimum ${\hat\btheta }_{\mathit{MM}}$ of

$Q_{\mathit{MM}} = \sum _{i=1}^ n \rho \left( {y_ i - \mb {x}_ i’ \btheta \over {\hat\sigma }’} \right)$

such that $Q_{\mathit{MM}}({\hat\btheta }_{\mathit{MM}}) \leq Q_{\mathit{MM}}({\hat\btheta }’)$ . The algorithm for M estimation is used here.

The ROBUSTREG procedure provides two choices for $\rho$ : Tukey’s bisquare function and Yohai’s optimal function.

Tukey’s bisquare function, which you can specify with the option CHIF=TUKEY, is

$\rho (s) = \chi _{k_1}(s) = \left\{ \begin{array}{ll} 3({s\over k_1})^2 - 3({s\over k_1})^4 + ({s\over k_1})^6 & {\mbox{if }} |s| \leq k_1 \\ 1 & {\mbox{ otherwise }} \end{array} \right.$

where can be specified with the K1= option. The default is 3.440 such that the MM estimate has 85% asymptotic efficiency with the Gaussian distribution.

Yohai’s optimal function, which you can specify with the option CHIF=YOHAI, is

$\rho (s) = \chi _{k_1}(s) = \left\{ \begin{array}{lll} {s^2 \over 2} & {\mbox{if }} |s| \leq 2 k_1 \\ k_1^2 [ b_0 + b_1({s\over k_1})^2 + b_2({s\over k_1})^4 & \\ + b_3({s\over k_1})^6 + b_4({s\over k_1})^8] & {\mbox{if }} 2 k_1 < |s| \leq 3 k_1\\ 3.25 k_1^2 & {\mbox{if }} |s| >3 k_1 \end{array} \right.$

where can be specified with the K1= option. The default is 0.868 such that the MM estimate has 85% asymptotic efficiency with the Gaussian distribution.

Algorithm

The initial LTS estimate is computed using the algorithm described in the section LTS Estimate. You can control the quantile of the LTS estimate with the option INITH=h, where h is an integer between $[{n\over 2}] + 1$ and $[{3n+p+1\over 4}]$ . By default, $h=[{3n+p+1\over 4}]$ , which corresponds to a breakdown value of around 25%.

The initial S estimate is computed using the algorithm described in the section S Estimate. You can control the breakdown value and efficiency of this initial S estimate by the constant , which can be specified with the K0 option.

The scale parameter $\sigma$ is solved by an iterative algorithm

$(\sigma ^{(m+1)})^2 = {1\over (n-p)\beta }\sum _{i=1}^ n \chi _{k_0} \left( {r_ i\over \sigma ^{(m)}} \right) \left(\sigma ^{(m)}\right)^2$

where $\beta = \int \chi _{k_0}(s) d\Phi (s)$ .

Once the scale parameter is computed, the iteratively reweighted least squares (IRLS) algorithm with fixed scale parameter is used to compute the final MM estimate.

Convergence Criteria

In the iterative algorithm for the scale parameter, the relative change of the scale parameter controls the convergence.

In the iteratively reweighted least squares algorithm, the same convergence criteria for the M estimate used before are used here.

Bias Test

Although the final MM estimate inherits the high-breakdown-value property, its bias due to the distortion of the outliers can be high. Yohai, Stahel, and Zamar (1991) introduced a bias test. The ROBUSTREG procedure implements this test when you specify the BIASTEST= option in the PROC ROBUSTREG statement. This test is based on the initial scale estimate ${\hat\sigma }’$ and the final scale estimate ${\hat\sigma }_{1}’$ , which is the solution of

${1\over n-p} \sum _{i=1}^ n \chi \left( {y_ i - \mb {x}_ i’ {\hat\btheta }_{\mathit{MM}} \over {\hat\sigma }_{1}’ }\right) = \beta$

Let $\psi _{k_0}(z) = {\frac{\partial \chi _{k_0}(z)}{\partial z}}$ and $\psi _{k_1}(z) = {\frac{\partial \chi _{k_1}(z)}{\partial z}}$ . Compute

$\displaystyle {\tilde r}_ i$	$\displaystyle =$	$\displaystyle (y_ i - \mb {x}_ i’{\hat\btheta }’) / {\hat\sigma }’ \ \ \ {\mbox{ for }} i=1,\ldots ,n$
$\displaystyle v_0$	$\displaystyle =$	$\displaystyle {(1 / n) \sum \psi _{k_0}’({\tilde r}_ i) \over ({\hat\sigma }_{1}’ / n) \sum \psi _{k_0}({\tilde r}_ i) {\tilde r}_ i}$

$\displaystyle p_ i^{(0)}$	$\displaystyle =$	$\displaystyle {\psi _{k_0}({\tilde r}_ i) \over (1 / n) \sum \psi _{k_0}’({\tilde r}_ i)} \ {\mbox{ for }} i=1,\ldots ,n$
$\displaystyle p_ i^{(1)}$	$\displaystyle =$	$\displaystyle {\psi _{k_1}({\tilde r}_ i) \over (1 / n) \sum \psi _{k_1}’({\tilde r}_ i)} \ {\mbox{ for }} i=1,\ldots ,n$
$\displaystyle d^2$	$\displaystyle =$	$\displaystyle {1\over n} \sum (p_ i^{(1)} - p_ i^{(0)})^2$

Let

$T = {2n( {\hat\sigma }_{1}’ - {\hat\sigma }’) \over v_0 d^2 ({\hat\sigma }’)^2}$

Standard asymptotic theory shows that T approximately follows a $\chi ^2$ distribution with p degrees of freedom. If T exceeds the $\alpha$ quantile $\chi _\alpha ^2$ of the $\chi ^2$ distribution with p degrees of freedom, then the ROBUSTREG procedure gives a warning and recommends that you use other methods. Otherwise, the final MM estimate and the initial scale estimate are reported. You can specify $\alpha$ with the ALPHA= option following the BIASTEST= option. By default, ALPHA=0.99.

Asymptotic Covariance and Confidence Intervals

Since the MM estimate is computed as a M estimate with a known scale in the last step, the asymptotic covariance for the M estimate can be used here for the asymptotic covariance of the MM estimate. Besides the three estimators H1, H2, and H3 as described in the section Asymptotic Covariance and Confidence Intervals, a weighted covariance estimator H4 is available. H4 is calculated as

$K^2 {[1 / (n-p)] \sum (\psi (r_ i))^2 \over [(1 / n) \sum (\psi ’(r_ i))]^2 } \bW ^{-1}$

where $K=1 + {p\over n} {{\mbox{Var}}(\psi ’)\over (E \psi ’)^2}$ is the correction factor and $W_{jk} = {1\over {\bar{w}}} \sum w_ i x_{ij}x_{ik}$ , ${\bar{w}} = {1\over n} \sum w_ i$ .

You can specify these estimators with the option ASYMPCOV= [H1 | H2 | H3 | H4]. The ROBUSTREG procedure uses H4 as the default. Confidence intervals for estimated parameters are computed from the diagonal elements of the estimated asymptotic covariance matrix.

R Square and Deviance

The robust version of R-square for the MM estimate is defined as

$R^2 = {{\sum \rho \left({y_ i-{\hat\mu } \over {\hat s}}\right) - \sum \rho \left({y_ i-\mb {x}_ i’ {\hat\btheta } \over {\hat s}}\right) \over \sum \rho \left({y_ i-{\hat\mu } \over {\hat s}}\right)}}$

and the robust deviance is defined as the optimal value of the objective function on the $\sigma ^2$ scale,

$D = 2 {\hat s}^2 \sum \rho \left({y_ i-\mb {x}_ i’ {\hat\btheta } \over {\hat s}}\right)$

where $\rho ’ = \psi$ , ${\hat\btheta }$ is the MM estimator of $\btheta$ , ${\hat\mu }$ is the MM estimator of location, and $\hat s$ is the MM estimator of the scale parameter in the full model.