Minimization Methods

Subsections:

PROC MODEL currently supports two methods for minimizing the objective function. These methods are described in the following sections.

GAUSS

The Gauss-Newton parameter-change vector for a system with g equations, n nonmissing observations, and p unknown parameters is

\[  {{\bDelta }}=(\mb {X} ’\mb {X} )^{-1}\mb {X} ’\mb {r}  \]

where $\bDelta $ is the change vector, X is the stacked ${ng \times p}$ Jacobian matrix of partial derivatives of the residuals with respect to the parameters, and r is an ${ng \times 1}$vector of the stacked residuals. The components of X and r are weighted by the S $^{-1}$ matrix. When instrumental methods are used, X and r are the projections of the Jacobian matrix and residuals vector in the instruments space and not the Jacobian and residuals themselves. In the preceding formula, S and W are suppressed. If instrumental variables are used, then the change vector becomes:

\[  {{\bDelta }}=(\mb {X} ’(\mb {S} ^{-1}{\otimes }\mb {W} )\mb {X} )^{-1}\mb {X} ’(\mb {S} ^{-1}{\otimes }\mb {W} )\mb {r}  \]

This vector is computed at the end of each iteration. The objective function is then computed at the changed parameter values at the start of the next iteration. If the objective function is not improved by the change, the $\bDelta $ vector is reduced by one-half and the objective function is reevaluated. The change vector will be halved up to MAXSUBITER= times until the objective function is improved. If the objective function cannot be improved after MAXSUBITER= times, the procedure switches to the MARQUARDT method described in the next section to further improve the objective function.

For FIML, the ${\mb {X} ’\mb {X} }$ matrix is substituted with one of three choices for approximations to the Hessian. See the section Full Information Maximum Likelihood Estimation (FIML) in this chapter.

MARQUARDT

The Marquardt-Levenberg parameter change vector is

\[  {{\bDelta }}=(\mb {X} ’\mb {X} +{\lambda }\mr {diag}(\mb {X’X} ))^{-1}\mb {X’r}  \]

where $\bDelta $ is the change vector, and X and r are the same as for the Gauss-Newton method, described in the preceding section. Before the iterations start, ${\lambda }$ is set to a small value (1E–6). At each iteration, the objective function is evaluated at the parameters changed by $\bDelta $. If the objective function is not improved, ${\lambda }$ is increased to 10${\lambda }$ and the step is tried again. ${\lambda }$ can be increased up to MAXSUBITER= times to a maximum of 1E15 (whichever comes first) until the objective function is improved. For the start of the next iteration, ${\lambda }$ is reduced to max(${\lambda }$/10,1E–10).