Journal Articles

Electrostatic Fields and Surfaces in MOE


M. Santavy, P. Labute
Chemical Computing Group Inc.


The 1998.10 version of MOE contains powerful new functionality for manipulating scalar and vector fields. New SVL primitives use sophisticated algorithms for operating on high dimensional fields stored as grids; i.e., rectangular arrays that contain samples of the field at (possibly irregular) lattice coordinates. MOE contains facilities for:

  • Creating and manipulating multi-dimensional grids
  • Sampling Gaussian densities on grids
  • Interpolation of values at off-grid points
  • Interpolation of color values on grids
  • Calculating derivatives on- and off-grids
  • Generating iso-surface points, lines and triangles
  • Rendering solid surfaces

These new components in MOE have wide applications in molecular visualization and simulation; for example, molecular surfaces, docking and Poisson-Boltzmann equation solving.

In this article, we will introduce the SVL grid functions, examine how they can be applied to the generation of molecular surfaces, and look at ways to solve the Poisson-Boltzmann equation (used to calculate the electrostatic field).

 

Grid-Based Fields

SVL grids provide storage and manipulation of scalar fields in multidimensional spaces. Each grid is stored as a pair composed of a data vector and a shape vector. The data vector is a flat numeric vector that contains values of the field at the gridpoints. The shape vector stores the positions (coordinates) of those gridpoints.

The grid is always rectangular, with either uniform or non-uniform spacing. The shape vector store the gridpoint positions along each dimension. Each element of the shape vector is a flat vector of ascending grid positions along the corresponding dimension. The values of the field at the gridpoints are stored in the row-major order, i.e., with the index of the last dimension changing most rapidly.

For example, consider a 1-dimensional field sampled at 3 points, F(1.2)=12, F(2.2)=22, and F(3.7)=37:

    coordinate 1.2 2.2 3.7
    value 12 22 37
The grid of the field is stored as a pair:
  • data = [12,22,37];
  • shape = [[1.2,2.2,3.7]];

As another example, consider a 2-dimensional field sampled at 6 points arranged in a 2×3 matrix: F(xi,yj) = fij, where i=1..2, j=1..3.

    value Y coordinate
    y1 y2 y3
    X coordinate x1 f11 f12 f13
    x2 f21 f22 f23
The grid of the field is stored as a pair:
  • data = [f11,f12,f13,f21,f22,f23];
  • shape = [[x1,x2], [y1,y2,y3]];

Indexing. Each gridpoint can be represented either by single index or multiple index (multi-index). The single index represents the position of the field value at a gridpoint in the flat data vector. The multiple index represents the positions of the gridpoint coordinate in the nested shape vector.

Single indices are used to manipulate the grid value in the data vector, while multi-indices manipulate the positions of the gridpoints in the shape vector. If the pair of vectors [data,shape] represents the grid, and sidx and midx are the corresponding single- and multi-index representations of the gridpoints, then the values of the grid at the gridpoints are get[data,sidx], while the coordinates of the gridpoints themselves are apt get[shape,midx].

For example, to access the gridpoint that stores value f23 in our previous 2x3 grid:

    y1 y2 y3
    x1 f11 f12 f13
    x2 f21 f22 f23
we can use:
  • either a single index sidx == 6, such that get[data,6] == f23
  • or a multi-index midx == [2,3] such that apt get[shape,6] == [x2, y3]

Data Calculations. The following functions calculate the value of the field and its derivatives either at specified points in space or at the gridpoints. In general, we use quadratic interpolation formulas at the gridpoints combined with multi-linear interpolation for point locations between the gridpoints.

Given K-dimensional grid, we use an interpolating polynomial of K variables of maximum combined degree 2. For example, for a 2-dimensional grid, we use an interpolating polynomial of general form:

f(x,y) = Cxx x2 + Cyy y2 + Cxy x y + Cx x + Cy y + C0

The interpolating formulas assume that the interpolating polynomial coincides with the field values at a given gridpoint and at each of the gridpoint's 2K neighbors, 2 along each axis. The derivatives of the field at the gridpoints are calculated as the derivatives of the interpolating polynomial. The values of the field (or its derivatives) at sample points off the gridpoints are calculated by K-linear interpolation of the values (or the derivatives) of the quadratic interpolation function calculated at each of the 2K gridpoints around the sample point.

  • values = grid_value_at [data, shape, [x, y, ... ]]
    
    Returns the field values at given points in space, inside or outside the grid boundaries.

  • [dx, dy, ...] = grid_grad_at [data, shape, [x, y, ... ]]
    
    Returns the values of the field gradient at given points in space.

  • values = grid_laplace_at [data, shape, [x, y, ... ]]
    
    Returns the values of the field Laplacian at given points in space.

  • values = grid_deriv_at [data, shape, [x, y, ... ], dims]
    
    Returns the values of the partial derivatives in the specified dimensions at given points in space.

    Note: Since the degree of the interpolating polynomial is 2, any derivative of degree higher that 2 is always 0. A derivative of degree 0 is calculated as a linear interpolation of the neighboring gridpoint, which is faster than and often numerically just as acceptable as the quadratic interpolation used by the function grid_value_at.

  • [dx, dy, ...] = grid_grad [data, shape]
    values = grid_laplace [data, shape]
    values = grid_deriv [data, shape, dim]
    
    Equivalent to the functions above, except that the values are calculated for each gridpoint.

  • RGB_values = grid_color_at [RGB_data, shape, [x, y, ... ]]
    
    A special form of function grid_value_at used for interpolating RGB colors. The data are integers composed of 4 color channels, each occupying 8 bits. The function interpolates each channel separately. Out-of-band interpolated color values are clamped, not masked.

  • new_data = grid_reshape [data, shape, new_shape]
    
    Creates a data vector of a new grid by interpolating a given grid at the gridpoints of a new grid. The result is identical to that of:
    new_size = app length new_shape;
    new_coord = grid_coord [new_shape, igen mul new_size];
    new_data = grid_value_at [data, shape, new_coord];
    

 

Gaussian Molecular Surfaces

A 3D Gaussian density has the form:

f(x) = (2 PI s)../sup> exp { - 0.5 (x - u)T (x - u) / s2 }

where u is the mean and s2 is the variance. A sum of such functions with different means and variances is a sum of Gaussians. Such a function can be used to approximate the accessible and Connolly surfaces.

SVL has a number of functions to manipulate Gaussian densities on grids. One of these functions is the grid_addgauss function, which adds samples of a Gaussian density to a grid. The form of this function is:

    grid_addgauss [data, shape, weight, sigma, pos, radius]

where

  • data is a grid data vector to which the Gaussian sample will be added
  • shape is the grid shape vector (dimensions and spacing)
  • weight is a scale factor on the density
  • sigma is the square root of the variance
  • pos is the location of the mean (center)
  • radius defines the radius density sampling (grid modification)

This function is vectorized so that if weight, sigma, pos, and radius are vectors defining n Gaussian densities, then all n densities will be added to the grid data. By manipulating the weight and sigma parameters it is possible to create a density such that the level 1 surface approximates a water accessible surface or a Connolly surface:

    // mol_surface returns a molecular surface.  The ridge
    // and probe parameters control the nature of the surface; for
    // example,
    //
    //      probe=1.4 ridge=2.5     accessible
    //      probe=0.0 ridge=1.2     connolly
    //
    // the level=1 isosurface is the boundary.

    function mol_surface [shape, atoms, probe, ridge]

	local p = aPos atoms;                   // atom positions
	local r = probe + aRadius atoms;        // VDW radii

	local w = cube (sqrt(2*PI)*r/ridge) * exp (0.5 * sqr ridge);
	return grid_addgauss [0, shape, w, r/ridge, p, 4*r];
    endfunction

For example, the following is a Gauss-Connolly surface of insulin that is color-coded by partial charge on a grid with spacing 0.75 Angstroms.

 

Electrostatic Potential Fields

One important application of both the SVL grid functions and molecular surfaces is the calculation of electrostatic potentials. In particular, the Poisson-Boltzmann (PB) equation. The PB equation is used to calculate the electrostatic potential of a charge distribution in a salt solution. In this formulation, the solvent and the counter-ions are time-averaged or, in other words, treated implicitly. The PB equation is:

grad · eps(x) grad u(x) + (e2/eps0) [ g(x) + f(x) ] = 0

where:

  • eps(x) is the dielectric field (unitless)
  • u(x) is the electrostatic potential energy (kcal/mol)
  • e is the charge of an electron (C/mol)
  • eps0 is the vacuum permittivity (F/m)
  • g(x) is the counter-ion partial charge density
  • f(x) is the solute partial charge density

Now, g(x) is given by:

g(x) = -c(x) I sinh (u(x)/kT)

where I is the counter-ion concentration (mol/L), and c(x) is an indicator field that is 0 in the regions inaccessible to the counter-ions and a unit conversion factor elsewhere. By setting eps(x) and I one can define vacuum potentials or salt-solution potentials. Since sinh x is approximately x for small x, one often solves the linearized PB equation (we now drop the x argument in the functions and fold ../kT into c):

grad · eps grad u + (e2/eps0) [ -c u + f ] = 0

To solve the linearized PB equation (LPB) we must calculate the three fields that do not change with time. We will assume that the grid shape vector shape has been calculated.

Partial Charge Density. The solute partial charge density, f, when taken to be a sum of Gaussian densities, is constructed with the following code fragment:

    local q = ...               // partial charges of atoms
    local rad = ...             // radius of each atom
    local pos = ...             // position of each atom

	// calculate density of each atom as a gaussian
	// with standard deviation radius/2.

    local f = grid_addgauss [0, shape, q, rad/2, pos, 2.5*rad];

Dielectric Field. The dielectric field, eps, is generally constructed from three separate regions: the solvent region (e.g., water with dielectric 80), the solute region (e.g., a protein with dielectric 4) and a transition region. To construct eps, one can use the following SVL code fragment:

    const EPS_SOLVENT = 80;
    const EPS_SOLUTE = 4;
    const SOLVENT_RADIUS = 1.4;

	// calculate the accessible surface of a 1/2 radius
	// solvent sphere this will make a grid of values
	// either EPS_SOLVENT or EPS_SOLUTE

    local eps = mol_surface [shape,atoms,SOLVENT_RADIUS/2,2.5] < 1;
    eps = select [EPS_SOLVENT, EPS_SOLUTE, eps];

	// form the real eps (and the transition region) by
	// a 6 neighbor averaging scheme

    eps = eps + (1/7) * grid_laplace [eps, shape] * sqr (0.5);

In this fragment, mol_surface is as described above. The grid_laplace function calculates the Laplacian of the grid (the second argument is the shape vector: the coordinates along each axis).

The Ion Accessible Field. The ion accessibility field, c, defines both the counter-ion concentration and the accessibility region. It is calculated similarly to the dielectric field:

    const T = 300;
    const I = 0.5;              // ion concentration (mol/L)
    const ION_RADIUS = 2.0;     // radius of a counter ion

	// calculate fundamental accessibility region

    local c = mol_surface [shape, atoms, ION_RADIUS, 2.5] < 1;
    c = (UNIT_CONVERSION * I / (T*KBOLTZ)) * c;

	// smooth with neighbor scheme

    c = c + (1/7) * grid_laplace [c, shape] * sqr (0.5);

Solution of the LPB Equation. Once set up, the LPB equation can be solved by one of several methods that are used to solve partial differential equations. In a future article, we will describe a new method called Exponential Time Propagation, which has numerous advantages over conventional methods. However, for the purposes of this article, we will describe the Jacobi diffusion method.

The LPB equation can be expressed in short-hand as Lu + f = 0 where L is a linear operator. From the electrostatics theory we have:

d/dt L u(t) + f = 0

where t is time. This observation leads to an iterative method of solving the LPB equation. If we expand u in a Taylor series we obtain:

u(t + h) = u(t) + h (d/dt) u(t) + O(h2)
= u(t) + h (L u(t) + f) + O(h2)

for some time increment h. For example, if the counter-ion concentration is 0 and there is a unit dielectric everywhere, then the LPB reduces to the Poisson equation. The following SVL code fragment could then be used to effect the iteration given a starting vector u:

    local u = ...		// initial vector
    local h = ...		// some time increment
    local f = ...		// density

    for i = 1, MAXIT loop
	res = grid_laplace [u, shape] + f;
	if max abs res < 0.001 then break; endif
	u = u + h * res;
    endloop

The iteration will stop after MAXIT iterations or until the residual falls below 0.001. When the iteration terminates, the vector u will contain the solution of the equation. This field can then be visualized with isosurfaces, for example:

The above depicts the +2/-2 kcal/mol isosurfaces of oxytocin in a 0.5 mol/L salt solution. The calculation was performed at 0.75 grid spacing and run for 120 units of time (relative error of 0.1 kcal/mol).

 

Summary

The grid functions of SVL and solid surface rendering capabilities of MOE are an important advance both for visualization and for modeling. The fundamental grid operators allow:

  • Manipulation of scalar and vector fields
  • Manipulation of Gaussian scalar fields
  • Calculation of derivatives
  • Calculation of isosurfaces
  • Interpolation of values

Such scalar fields have many applications. We have seen some in this article, however, many more are possible:

  • Multi-dimensional color coding and color interpolation
  • Ligand-protein and protein-protein docking
  • Representation of multi-dimensional probability distributions

In future articles of the JCCG, we will explore some of these applications in greater detail.