Journal Articles

The Moe Nonlinear Optimization Library



P. Labute
Chemical Computing Group Inc.
1010 Sherbrooke Street W, Suite 910; Montreal, Quebec; Canada H3A 2R7


July 15, 1995


Abstract. The nonlinear optimization methods provided in MOE are described along with performace data. Among the methods supported by the library are Steepest Descent, Conjugate Gradient, and Truncated Newton. In particular, the Truncated Newton delivers superlinear convergence rates yet requires little memory.

An important SVL program library provided with MOE is the unconstrained non-linear optimization library. Among the methods contained in the library are Steepest Descent, Conjugate Gradient, and Truncated Newton. Each of the methods is suitable for large-scale problems; that is, each of the methods requires memory linear in the number of free variables. In this article we present the methods as well as the results of an experiment to compare the methods.

Each of the optimization methods has the same basic structure. Let x[k] be the estimate of the minimizer to a twice continuously differentiable function f at iteration k. At each iteration, the following steps are performed:

  1. Convergence Test. If convergence tests pass then terminate with x[k] as the solution. For pre-set convergence parameters a and b the algorithm terminates if all three of the following conditions are satisfied (for k > 1):
    1. | f (x[k]) - f (x[k] + t[k] p[k]) | < a { 1 + |f (x[k])| }
    2. | t[k] p[k] | < sqrt a { 1 + |f (x[k] + t[k] p[k] )| }
    3. | grad f (x[k] + t[k] p[k]) | < cbrt a |f (x[k])|

    or if (for k > 0)

    | grad f (x[k] + t[k] p[k]) | < b { 1 + |f (x[k] + t[k] p[k])| }

    where t[k] and p[k] are described below.

  2. Search Direction. Determine a descent direction p[k]; i.e., a direction such that
    g(t) = f (x[k] + t p[k])


    is locally decreasing. (The determination of the search direction is the only step in which the various methods differ.)

  3. Line Search. Determine a step size t[k] that minimizes the function g(t) = f (x[k] + t p[k]). A cubic spline interpolation search procedure is used, safe-guarded with a step size limit and interval bisection.
  4. Update. Set x[k+1] = x[k] + t[k] p[k] and go to Step 1.

Termination Criteria. Most optimization methods will converge only in the limit (even those that require only a finite number of steps with exact arithmetic). Termination criteria are important for a number of reasons:

  • Acceptability of the solution;
  • Unreasonably slow progress;
  • Unreasonable computing resources have been used.
The Optimization Library includes test for these conditions; however, it must be noted that, in practice, only a few optimality conditions can be tested and only in an approximate way. The optimality tests are:

  • | f (x[k]) - f (x[k] + t[k] p[k]) | < a { 1 + |f (x[k])| } and | t[k] p[k] | < sqrt a { 1 + |f (x[k] + t[k] p[k] )| } are designed to test whether the sequence of minimizer estimates is converging.
  • | grad f (x[k] + t[k] p[k]) | < cbrt a |f (x[k])| is designed to test the necessary optimality condition of a vanishing gradient.

These conditions apply mostly for ill-conditioned problems and the usual gradient test will apply in well-conditioned problems.

Search Direction. Each of the methods uses a different procedure to determine the search direction p[k]. The method of Steepest Descent is the simplest of the methods and is most often run for a few iterations before using one of the more powerful methods. The method of steepest descent takes p[k] = - grad f (x[k]) which is guaranteed to be a descent direction. The method of Conjugate Gradients is well known and described elsewhere.

The Truncated Newton method uses an approximate solution to the Newton equations to determine a search direction. If, on iteration k, we expand f with a Taylor series along a direction p and truncate it after the second term:

f (x[k] + tp) = f (x[k]) + t g[k]T p + (1/2) t2 pT G[k] p

we obtain a local quadratic model. The critical points of the model with respect to t and p occur when

  1. t = - g[k]T p / pT G[k] p
  2. G[k] p = - g[k]
under the assumption that G[k] is positive definite. The equations in (2) are called the Newton Equations. Newton's Method attempts to directly solve the equations for the search direction (e.g., by matrix inversion); however, there are a number of problems with this approach:
  • Far away from the local minimum, G[k] may not be positive definite. This case must be detected and special techniques must be used to determine the descent direction (the simplest being the use of the Steepest Descent direction).
  • The memory requirements are large (since G[k] must be stored). For a problem of n variables, G[k] requires n2 words of memory.
  • The inversion of G[k] at each step is computationally expensive. For a problem of n variables, inversion of G[k] requires O(n3) operations.

The MOE Truncated Newton method uses two techniques to overcome these problems:

  1. An iterative equation solver is used to solve the Newton Equations at each step. The iterative method has the advantage that it can be stopped at any iteration and a search direction can be reported at any time. This means that, far away from the minimum, a less accurate solution can be used. Furthermore, since only matrix-vector multiplies are required of G[k], it need not be stored explicitly: a finite difference technique is used to evaluate Hessian-vector products.
  2. The Hessian, G[k], is replaced with an approximation matrix that is guaranteed to be positive definite. This approximation will approach the Hessian as the minimum is reached.

As an example, we ran the mentioned optimization methods on the protein 6RXN from the Brookhaven Protein Data Bank. The AMBER 1989 paramter set was used with a cutoff of 7.5A and a distance-based dielectric. Each method began with 5 steps of Steepest Descent and was terminted when the RMS gradient fell below 0.0002. The line search methodology was held fixed for all of the tests; that is, only the selection of descent direction differed.

The following table summarizes the results of the experiments. For each method, we report the total number of iterations required, the total number of function evaluations, and the normalized elapsed time required.

Method
Iterations
Evaluations
Elapsed Time
Truncated Newton
77
2154
1
Conjugate Gradient
1661
9096
6.62
Steepest Descent
>>30000
>>200000
>>100

Since the library is written entirely in SVL, it can be used to optimize functions written in SVL. If an objective function calculates the function value and gradient it can be optimized with the Optimization Library. The MOE Optimization Library is a powerful tools that has been used in a number of applications including energy minimization, Implicit Euler dynamics, partial charge methods and nonlinear fitting.