The NLMIXED Procedure

Restricting the Step Length

Almost all line-search algorithms use iterative extrapolation techniques that can easily lead them to (feasible) points where the objective function f is no longer defined or is difficult to compute. Therefore, PROC NLMIXED provides options restricting the step length $\alpha $ or trust region radius $\Delta $, especially during the first main iterations.

The inner product $\mb{g}^\prime \mb{s}$ of the gradient $\mb{g}$ and the search direction $\mb{s}$ is the slope of $f(\alpha ) = f(\btheta + \alpha \mb{s})$ along the search direction $\mb{s}$. The default starting value $\alpha ^{(0)} = \alpha ^{(k,0)}$ in each line-search algorithm ($ \min _{\alpha > 0} f(\btheta + \alpha \mb{s}) $) during the main iteration k is computed in three steps:

  1. The first step uses either the difference $|\Delta f|=|f^{(k)} - f^{(k-1)}|$ of the function values during the last two consecutive iterations or the final step-size value $\alpha ^\_ $ of the last iteration $k-1$ to compute a first value of $\alpha _1^{(0)}$.

    • If the DAMPSTEP option is not used,

      \[  \alpha _1^{(0)} = \left\{  \begin{array}{ll} \mathit{step} &  \mbox{ if } 0.1 \le \mi{step} \le 10 \\ 10 &  \mbox{ if } \mathit{step} > 10 \\ 0.1 &  \mbox{ if } \mathit{step} < 0.1 \end{array} \right.  \]

      with

      \[  \mathit{step} = \left\{  \begin{array}{ll} |\Delta f| / |\mb{g}^\prime \mb{s}| &  \mbox{if } |\mb{g}^\prime \mb{s}| \ge \epsilon \max (100 \times |\Delta f|,1) \\ 1 &  \mbox{otherwise} \end{array} \right.  \]

      This value of $\alpha _1^{(0)} $ can be too large and can lead to a difficult or impossible function evaluation, especially for highly nonlinear functions such as the EXP function.

    • If the DAMPSTEP= r option is used,

      \[  \alpha _1^{(0)} = \min (1,r \alpha ^\_ )  \]

      The initial value for the new step length can be no larger than r times the final step length $\alpha ^\_ $ of the former iteration. The default value is r = 2.

  2. During the first five iterations, the second step enables you to reduce $\alpha _1^{(0)}$ to a smaller starting value $\alpha _2^{(0)}$ by using the INSTEP= r option:

    \[  \alpha _2^{(0)} = \min (\alpha _1^{(0)},r)  \]

    After more than five iterations, $\alpha _2^{(0)}$ is set to $\alpha _1^{(0)}$.

  3. The third step can further reduce the step length by

    \[  \alpha _3^{(0)} = \min (\alpha _2^{(0)},\min (10,u))  \]

    where u is the maximum length of a step inside the feasible region.

The INSTEP= r option enables you to specify a smaller or larger radius $\Delta $ of the trust region used in the first iteration of the trust region and double-dogleg algorithms. The default initial trust region radius $\Delta ^{(0)}$ is the length of the scaled gradient (Moré, 1978). This step corresponds to the default radius factor of r = 1. In most practical applications of the TRUREG and DBLDOG algorithms, this choice is successful. However, for bad initial values and highly nonlinear objective functions (such as the EXP function), the default start radius can result in arithmetic overflows. If this happens, you can try decreasing values of INSTEP= r, 0 < r < 1, until the iteration starts successfully. A small factor r also affects the trust region radius $\Delta ^{(k+1)}$ of the next steps because the radius is changed in each iteration by a factor $0 < c \le 4$, depending on the ratio $\rho $ expressing the goodness of quadratic function approximation. Reducing the radius $\Delta $ corresponds to increasing the ridge parameter $\lambda $, producing smaller steps aimed more closely toward the (negative) gradient direction.