Electrostatic Fields and Surfaces in MOE
M. Santavy, P. Labute
Chemical Computing Group Inc.
The 1998.10 version of MOE contains powerful new functionality for manipulating scalar and vector fields. New SVL primitives use sophisticated algorithms for operating on high dimensional fields stored as grids; i.e., rectangular arrays that contain samples of the field at (possibly irregular) lattice coordinates. MOE contains facilities for:
These new components in MOE have wide applications in molecular visualization and simulation; for example, molecular surfaces, docking and Poisson-Boltzmann equation solving.
In this article, we will introduce the SVL grid functions, examine how they can be applied to the generation of molecular surfaces, and look at ways to solve the Poisson-Boltzmann equation (used to calculate the electrostatic field).
SVL grids provide storage and manipulation of scalar fields in multidimensional spaces. Each grid is stored as a pair composed of a data vector and a shape vector. The data vector is a flat numeric vector that contains values of the field at the gridpoints. The shape vector stores the positions (coordinates) of those gridpoints.
The grid is always rectangular, with either uniform or non-uniform spacing. The shape vector store the gridpoint positions along each dimension. Each element of the shape vector is a flat vector of ascending grid positions along the corresponding dimension. The values of the field at the gridpoints are stored in the row-major order, i.e., with the index of the last dimension changing most rapidly.
For example, consider a 1-dimensional field sampled at 3 points, F(1.2)=12, F(2.2)=22, and F(3.7)=37:
As another example, consider a 2-dimensional field sampled at 6 points
arranged in a 2×3 matrix:
Indexing. Each gridpoint can be represented either by single index or multiple index (multi-index). The single index represents the position of the field value at a gridpoint in the flat data vector. The multiple index represents the positions of the gridpoint coordinate in the nested shape vector.
Single indices are used to manipulate the grid value
in the data vector, while multi-indices
manipulate the positions of the gridpoints in the shape vector.
If the pair of vectors [data,shape] represents the grid, and
sidx and midx are the corresponding single- and multi-index
representations of the gridpoints,
then the values of the grid at the gridpoints are
while the coordinates of the gridpoints themselves are
For example, to access the gridpoint that stores value f23 in our previous 2x3 grid:
Data Calculations. The following functions calculate the value of the field and its derivatives either at specified points in space or at the gridpoints. In general, we use quadratic interpolation formulas at the gridpoints combined with multi-linear interpolation for point locations between the gridpoints.
Given K-dimensional grid, we use an interpolating polynomial of K variables of maximum combined degree 2. For example, for a 2-dimensional grid, we use an interpolating polynomial of general form:
The interpolating formulas assume that
the interpolating polynomial coincides with the field
values at a given gridpoint and at each of the gridpoint's
Gaussian Molecular Surfaces
A 3D Gaussian density has the form:
where u is the mean and s2 is the variance. A sum of such functions with different means and variances is a sum of Gaussians. Such a function can be used to approximate the accessible and Connolly surfaces.
SVL has a number of functions to manipulate Gaussian densities on grids. One of these functions is the grid_addgauss function, which adds samples of a Gaussian density to a grid. The form of this function is:
grid_addgauss [data, shape, weight, sigma, pos, radius]
This function is vectorized so that if weight, sigma, pos, and radius are vectors defining n Gaussian densities, then all n densities will be added to the grid data. By manipulating the weight and sigma parameters it is possible to create a density such that the level 1 surface approximates a water accessible surface or a Connolly surface:
// mol_surface returns a molecular surface. The ridge // and probe parameters control the nature of the surface; for // example, // // probe=1.4 ridge=2.5 accessible // probe=0.0 ridge=1.2 connolly // // the level=1 isosurface is the boundary. function mol_surface [shape, atoms, probe, ridge] local p = aPos atoms; // atom positions local r = probe + aRadius atoms; // VDW radii local w = cube (sqrt(2*PI)*r/ridge) * exp (0.5 * sqr ridge); return grid_addgauss [0, shape, w, r/ridge, p, 4*r]; endfunction
For example, the following is a Gauss-Connolly surface of insulin that is color-coded by partial charge on a grid with spacing 0.75 Angstroms.
Electrostatic Potential Fields
One important application of both the SVL grid functions and molecular surfaces is the calculation of electrostatic potentials. In particular, the Poisson-Boltzmann (PB) equation. The PB equation is used to calculate the electrostatic potential of a charge distribution in a salt solution. In this formulation, the solvent and the counter-ions are time-averaged or, in other words, treated implicitly. The PB equation is:
Now, g(x) is given by:
where I is the counter-ion concentration (mol/L), and c(x) is an indicator field that is 0 in the regions inaccessible to the counter-ions and a unit conversion factor elsewhere. By setting eps(x) and I one can define vacuum potentials or salt-solution potentials. Since sinh x is approximately x for small x, one often solves the linearized PB equation (we now drop the x argument in the functions and fold ../kT into c):
To solve the linearized PB equation (LPB) we must calculate the three fields that do not change with time. We will assume that the grid shape vector shape has been calculated.
Partial Charge Density. The solute partial charge density, f, when taken to be a sum of Gaussian densities, is constructed with the following code fragment:
local q = ... // partial charges of atoms local rad = ... // radius of each atom local pos = ... // position of each atom // calculate density of each atom as a gaussian // with standard deviation radius/2. local f = grid_addgauss [0, shape, q, rad/2, pos, 2.5*rad];
Dielectric Field. The dielectric field, eps, is generally constructed from three separate regions: the solvent region (e.g., water with dielectric 80), the solute region (e.g., a protein with dielectric 4) and a transition region. To construct eps, one can use the following SVL code fragment:
const EPS_SOLVENT = 80; const EPS_SOLUTE = 4; const SOLVENT_RADIUS = 1.4; // calculate the accessible surface of a 1/2 radius // solvent sphere this will make a grid of values // either EPS_SOLVENT or EPS_SOLUTE local eps = mol_surface [shape,atoms,SOLVENT_RADIUS/2,2.5] < 1; eps = select [EPS_SOLVENT, EPS_SOLUTE, eps]; // form the real eps (and the transition region) by // a 6 neighbor averaging scheme eps = eps + (1/7) * grid_laplace [eps, shape] * sqr (0.5);
In this fragment, mol_surface is as described above. The grid_laplace function calculates the Laplacian of the grid (the second argument is the shape vector: the coordinates along each axis).
The Ion Accessible Field. The ion accessibility field, c, defines both the counter-ion concentration and the accessibility region. It is calculated similarly to the dielectric field:
const T = 300; const I = 0.5; // ion concentration (mol/L) const ION_RADIUS = 2.0; // radius of a counter ion // calculate fundamental accessibility region local c = mol_surface [shape, atoms, ION_RADIUS, 2.5] < 1; c = (UNIT_CONVERSION * I / (T*KBOLTZ)) * c; // smooth with neighbor scheme c = c + (1/7) * grid_laplace [c, shape] * sqr (0.5);
Solution of the LPB Equation. Once set up, the LPB equation can be solved by one of several methods that are used to solve partial differential equations. In a future article, we will describe a new method called Exponential Time Propagation, which has numerous advantages over conventional methods. However, for the purposes of this article, we will describe the Jacobi diffusion method.
The LPB equation can be expressed in short-hand as Lu + f = 0 where L is a linear operator. From the electrostatics theory we have:
where t is time. This observation leads to an iterative method of solving the LPB equation. If we expand u in a Taylor series we obtain:
for some time increment h. For example, if the counter-ion concentration is 0 and there is a unit dielectric everywhere, then the LPB reduces to the Poisson equation. The following SVL code fragment could then be used to effect the iteration given a starting vector u:
local u = ... // initial vector local h = ... // some time increment local f = ... // density for i = 1, MAXIT loop res = grid_laplace [u, shape] + f; if max abs res < 0.001 then break; endif u = u + h * res; endloop
The iteration will stop after MAXIT iterations or until the residual falls below 0.001. When the iteration terminates, the vector u will contain the solution of the equation. This field can then be visualized with isosurfaces, for example:
The above depicts the +2/-2 kcal/mol isosurfaces of oxytocin in a 0.5 mol/L salt solution. The calculation was performed at 0.75 grid spacing and run for 120 units of time (relative error of 0.1 kcal/mol).
The grid functions of SVL and solid surface rendering capabilities of MOE are an important advance both for visualization and for modeling. The fundamental grid operators allow:
Such scalar fields have many applications. We have seen some in this article, however, many more are possible:
In future articles of the JCCG, we will explore some of these applications in greater detail.