An Integrated Application in MOE for the Visualization and Analysis of Protein Active Sites with Molecular Surfaces, Contact Statistics and Electrostatic MapsPaul LabuteOctober 2006 Chemical Computing Group Inc.
IntroductionThe combination of widespread availability of xray crystal structure coordinates of proteins, possibly with bound ligands, and affordable 3D computer graphics hardware has made it possible for scientists to routinely visualize such structures in their efforts to design better inhibitors and drugs. Not so long ago, such visualizations used to be available only to computational chemistry specialists whose services would be required to operate the complex software that, more often than not, was difficult to use and ran only on expensive computer hardware. Moreover, such software was typically designed for molecular simulations and the primary intent of the visualization capabilities was to view the results of the calculations. Increasingly, noncomputational chemists are using visualization software to help in their scientific efforts. To such users, the primary use of visualization software is to "see what is going on" or "get ideas" rather than examine the output of a simulation or similar calculation. This is not to say that no calculations are performed for visualization purposes, but that such calculations are expected to be "under the hood" and not the focus of attention. Consequently, simple rotation of proteins and ligands as line/stick figures must be augmented with visualization capabilities that help with the interpretation of the atomic coordinates. A good example of this idea is the reentrant molecular surface originally proposed by [Lee 1971] and further developed in [Richards; 1977]. The surface consists of the boundary of the volume of space defined by removing all points inside the van der Waals surface of a spherical probe (that represents the solvent) located at all positions for which the probe does not contact the van der Waals volume of a molecule. This means that there is "fillin" of space between the van der Waals volume representing solvent inaccessible regions:
It is an unfortunate fact that the definition of the reentrant surface is such that it is computationally demanding to calculate. Since the first implementation described in [Connolly 1983] there have been ever more efficient methods that have appeared over the years (e.g., [Sanner 1996], [Liang 1998] and [Rocchia 2002]). What is important is that as computing power becomes cheaper what was once a complex calculation requiring special hardware now can be done in a few seconds on just about any modern cheap computer. Such a surface is a great help in visualizing ligands in protein active sites. Compare the two images below, one uses an atomic stick figure representation and the other uses the reentrant surface:
One can certainly see the advantages of using these surfaces and, indeed, such surface visualizations are commonly used to interpret crystal structcure coordinates. One disadvantage is that a surface obscures the underlying protein atoms; however, surface transparency and coloring are often used to convey important interactions possibly obscured by the surface. Another disadvantage of using the reentrant surfaces is that it provides little volumetric information  it can be difficult to estimate what modifications to a ligand would fit the active site. The figure on the right shows the same Factor Xa complex with an additional surface rendered using lines. This surface is the Interaction surface defined to be the 0 kcal/mol van der Waals energy contour of a carbon atom interacting with the active site. In other words, the surface contains all points at which the van der Waals energy becomes positive (repulsive). This surface is much closer to the nucleii of possible ligands and better illustrates how much space is available for ligand modifications. Generally speaking, the Interaction surface is more crowded in appearance than the reentrant surface. However, when used in combination, these two surfaces provide a quick way to see the extent and shape of an active site as well as an estimate of the steric limits to which substituents will be subject. What is still not clear are the interactions favored by the active site  what modifications would lead to better interactions. In this article we will examine a number of techniques that can be used to determine and display favorable interactions with active sites. These methods are a part of a single application in the Molecular Operating Environment (MOE) [CCG 1996]. By displaying interaction sites, scientists may be better able to optimize ligand features to either avoid unfavorable interactions or introduce new favorable interactions. Specifically, we will present techniques to:
The methodology to be described is sophisticated; however, in each case the calculation can be performed quickly, without user intervention, and the results are easily understood. This means that users can derive benefit without getting slowed down by arcane details and parameterizations. The remainder of this article is divided into sections that describe the methdology of each technique as well as an example of its use. We conclude with several observations in the final section. Electrostatic MapsElectrostatic Maps in MOE are based on solutions to the PoissonBoltzmann Equation (PBE). The PBE is a partial differential equation often used in the study of electrolites. Although there is a rich history of using the PBE for studying proteins (e.g., [Gilson 1987], [Sharp 1990] and [Grant 2001]) and calculating free energies of solvation we shall review its underlying theory and contrast it with the GRID methodology [Goodford 1985] which is also based upon electrostatic fields. PoissonBoltzmann Equation Theory. The starting point for the PBE is the Poisson equation. This is a fundamental equation of physics that relates the electrostatic potential φ and a charge density ρ at equilibrium:
Given an electrostatic potential φ one can calculate the underlying
charge density ρ that gives rise to φ with the Laplacian operator
If the charge density ρ is embedded in a polar medium then the electrostatic potential will be damped, or screened, in accordance with the polar character of the medium. Such strength is typically specified by a dielectric constant ε. This strength may be a function of spatial coordinates (e.g., ε = 1 in the interior of a protein and ε = 80 in the exterior to model a protein in water). A modification to the Poisson equation accounts for this dielectric "field": In the special case of point charges and a constant ε then the Poisson equation produces Coulomb's law divided by the dielectric constant. What is important about the Poisson equation is that an electrostatic potential φ completely and uniquely determines the underlying charge density ρ through the Laplacian operator. In the case of a dielectric ε we can write the modified Poisson equation as which shows that a solution to the modified Poisson equation can be thought of as the solution to the original Poisson equation but with a modified charge density. The difference between the modified charge density and the original density ρ is often referred to as the induced charge or the screening charge. This way of looking at the relationship between induced charges and the original charges is the key to understanding how the PoissonBoltzmann equation arises. Suppose that a charge density ρ is embedded in a polar medium specified by ε. Suppose, further, that in the medium there are a number of mobile ions each carring a formal charge q_{I}. At thermodynamic equilibrium these mobile ions will be distributed about ρ according to a charge density which we shall denote by ρ_{I}. The electrostatic potential, φ, from the combined charge densities must satisfy the Poisson equation:
For multiple types of mobile ions, the foregoing equation is easily extended
by replacing the lone ρ_{I} with a sum of similar densities
(one for each type). For the purposes of exposition, only one mobile ion type
will be considered. If we imagine that the mobile ions are spatially
restricted (e.g., the ions cannot be inside a protein and must remain in
solvent) then such a restriction can be specified with a spatial potential
u_{I} that has high energy in the forbidden regions of space
and low energy in the allowed regions of space. Taken with the electrostatic
potential φ the potential energy of a mobile ion at a particular point
in space will be, at equilibrium,
where c is normalization constant, k is Boltzmann's constant and T is the temperature of the system. Assuming that φ and u_{I} are 0 sufficiently far from ρ then at such distances the exponential term is equal to 1 which means that c may be taken as the bulk concentration of the mobile ions (say in mol/L). The combination of the previous two equations results in the PoissonBoltzmann Equation; namely that, which specifies not only the resulting electrostatic potential, but also the ionic charge densities both of which result from an original charge density ρ in a polar medium ε. The PBE can be solved analytically only in special cases (e.g., in Debye theory for electrolites). For more complex cases, such as proteins, the PBE must be solved numerically. This is most commonly done by approximating each spatial quantity (such as ε, φ or ρ) by a discrete lattice (3D grid) and using finite differences to approximate the differentiation operators. Calculation of Free Energies of Solvation. The calculation of the free energy of solvation (in water) of small molecules can be performed by solving the PBE. Such calculations are used to calibrate the parameters used in a PBE solution in the hope that the results are transferrable to proteins. Typically, the nonlinear Boltzmann term is not used; rather, the dielectric region ε is used alone to model the polar solvent. In the solute interior (the molecule in question) a small value of ε is used (1 or 2) while in the solute exterior a high value of ε is used (e.g., 80 for water). The electrostatic energy of the system is calculated and the gasphase energy is subtracted (as calculated with ε = 1 everywhere) to obtain the free energy of solvation. Commonly, an exposed surface area term is added to model nonclassical effects (such as the hydrophobicity of halogens). The entire calculation is dependent on how ε is discretized and modeled in the calculation. Not only are the results sensitive to the choice of partial charges, they are extremely senstive to the curvature of the transition region in ε between high dielectric and low dielectric. Although, much effort has been spent to define this region appropriately [Grant 2001] little attention has been paid to the nonlinear Boltzmann term upon which MOE Electrostatic Maps are fundamentally based. It is not commonly known that free energies of solvation can be calculated without the definition or use of the ε dielectric region at all! The correlation plot to the right shows the results of calculating the free energy of solvation of 495 small molecules using the PBE and AM1BCC charges [Jakalian 2002] without a dielectric region. The r^{2} is 0.88. In this calculation, the nonlinear PBE was solved with two pseudoion classes: an oxygen "ion" and a hydrogen "ion" along with the commonly used surface area term. TIP3P charges were for the {q_{I}} formal ion charges at 1 mol/L concentrations and AMBER [Weiner 1986] van der Waals parameters were used to determine the accessible region of these "ions". Such a definition is not as arbitrary as the ε regions commonly used  it was not fitted to the solvation data  and has a more direct physical interpretation. As can be seen readily, the implicit ion Boltzman term can effectively model free energies of solvation of small molecules  arguably to within the same accuracy as εbased calculations. This evidence suggests that the resuting Boltzmann densities of the implicit ions (resulting from the solution of the PBE equation) would carry meaning. Moreover, these densities are 3D quantities and are amenable to vizualization via energy contour levels. Electrostatic Maps Methodology. We will use the same ideas as presented above in the calculation of free energies of solvation using the nonlinear Boltzmann term of the PBE. That is, we will calculate the Boltzmann probability density of two classes of particle  an oxygen particle, O and a hydrogen particle, H when given the 3D coordinates of a protein along with their partial charges. Specifically, we solve the following equation for the electrostatic potential φ: which is the PBE with constant ε = 1, 1 mol/L ionic concentrations and two ion classes; ρ is the protein charge density approximated using a sumofGaussians definitions (the width proprotional to the OPLSAA [Jorgensen 1996] van der Waals radii); the spatial potentials u_{O} and u_{H} are van der Waals potentials of an O atom and the protein and an H atom and the protein using OPLSAA parameters. The formal charge paramteres q_{O} and q_{H} are TIP3P water partial charges. MOE users a Preconditioned Conjugate Residual solver with a MultiGrid preconditioner along with Newton's Method to solve the nonlinear PBE on a lattice. The calculation requires several seconds to converge on a modern PC.
The result of the calculation is a lattice with values of φ, the
equilibrium electrostatic potential of the protein charge density φ
and the Boltzmann distributed ion densities for O and H species. Commonly,
φ is displayed graphically as a molecular surface or as isocontours
in space. However, we are interested in the resulting O and H densities
in the nonlinear Boltzmann terms of the PBE. These are the quantities
that we will plot.
One could simply plot the nonlinear terms directly which would be a plot
of the ionic charge densities at equilibrium. However, it is more convenient
to plot the potential to which the ionic species are subject. Specifically,
we plot
q_{O} φ + u_{O}
and
q_{H} φ + u_{H}
which is a sum of the van der Waals and electrostatic potentials.
Note the similarity to molecular field type methods; the difference, however,
is that these methods use a Coulombtype electrostatic potential, not
an equilibrated PBE electrostatic potential.
The hydrophobic map is derived from the O and H maps by calculating
and plotting
Results and Applications. Fundamentaly, Electrostatic Maps are plots of van der Waals and electrostatic energies and, as such, they are similar in spirit to the GRID methodology [Goodford 1985]. However, there are important differences not only with the original GRID but with subsequent enhancements made to model solvation effects. These differences are related to the nonlinearity and selfconsistency of the Electrostatic Map fields  only one calculation is done so that each implicit ionic particle interacts (in a mean field sort of way) with the other implicit ionic particles; that is, each particle's final Boltzmann density depends on the density of the others and all depend on the solute charge density. This nonlinearity and self consistency utimately manifests itself in a more localized distribution of implicit ion densities. Approximate, or even exact ε = 80 solvation models can fail to adequately screen chemical groups leading to electrostatic domination of a region. The reason for this is that there is a nonlinear screening effect in buffered solution that cannot be modeled by linear ε dielectric models. A simple example will serve to illustrate this point:
In ε = 80 calculations (or approximations thereto), charged groups are not sufficiently screened by solvent effects. This is due to the linear screening of the dielectric region. Charged groups in aqueous environments are neutralized by moblie counterions that provide nonlinear screening effects and must be modeled using nonlinear Boltzmann term of the PBE. Moreover, charged groups in systems of biological importance exist in buffered solution. Consequently, implicit solvent models must take these into account. The two images above show clearly the difference between linear and nonlinear screening. The MOE Electrostatic Map methodology (on the right) inherently will neutralize the carboxylate in the upper right by modeling implicit counterions thus allowing the negative preference in the lower left to appear (in contrast to the image on the left). Nonlinear screening produces more localized Boltzman densities than simple VDW+Coulomb or VDW+Coulomb+ε maps (such as produced by GRID or other traditional molecular field calculations). In addition, traditional calculations are not selfconsistent  each contour, whether positive or negative, is calculated separately so that neutralization effects are not taken into account. The Electrostatic Map methodology can be applied, of course, to structures of biological interest. The following images are contours of Electrostatic Maps of several protein amino acid sidechains using AMBER charges at ca. 2 kcal/mol contour levels for positive and negative preferences. Aside from detailing the expected polar hydrogen bonding preferences one can see the preference for positive charge above and below aromatic groups (blue surface). This is due to the concentration of negative charges around the heavy atoms that make up the aromatic rings. This negative region will generally be in contact with the mildly positive hydrogen of an otherwise hydrophobic group (such as CH2). These images should be compared to the probabilistic potentials described below which support this assertion. The Electrostatic Maps can be calculated for receptors (without ligands). Consider the Factor Xa complex in PDB entry 1FAX. The following is a MOE calculated ProteinLigand schematic of key interactions. In the diagram below hydrophilic residues are purple, blue rings indicate basic groups, red rings indicate acidic groups, hydrophobic, residues are green, arrows are hydrogen bonds, blue discs on ligand atoms indicate solvent exposure when bound, light blue shadows indicate contact surface area):
The key interactions are S1 pocket interactions (on the left) consisting of the salt bridge between the ligand amidinium group on the napthalene ring and ASP 189 and the hydrophobic interactions between the napthalene ring and the sandwich of the backbone atoms of TRP 215 / GLY 216 on one side and GLN 192 on the other; and in the P pocket (on the right) consisting of the salt bridge between GLU 97 and the CNN+ structure on the five membered ring of the ligand and the hydrophobic interactions with the sidechains of TYR 99 / TRP 215 / PHE 174 and the said five membered ring. An Electrostatic Map for 1FAX with positive, negative and neutral contours was calculated without the ligand. AMBER charges were used for all of the atoms. This map is depicted below along with a reentrant surface and pocket labels.
Notice the localized features especially in the P pocket. The S1 pocket is somewhat buried. In the above image, positive preference is indicated in blue, negative in red and neutral in white. The contour levels are ca. 2 kcal/mol. The following image is that of the Electrostatic Map superimposed with the ligand of 1FAX and the receptor structure and surface hidden to help visualize the correspondence between the prediction and the ligand features.
The top left inset provides an alternative view of the P pocket map and the bottom right inset provides an alternative view of the S1 pocket map. This image shows the detail provided by the Electrostatic Map and the very good agreement between the map features and the key interactions. Notice in particular that the carboxylates of ASP 189 and GLU 97 are screened by the strong positive implicit ions (the blue contour). This positive ionic density effectively neutralizes the carboxylate groups and makes it possible for large neutral regions to exist in the P and especially the S1 pockets. The connecting linker group has no strong electrostatic interactions with the receptor and is largely exposed to solvent. The Electrostatic Map can be a useful tool for predicting electrostatic features in a protein active site. The inputs are the 3D atomic coordinates of the receptor (hydrogen atoms included) as well as the partial charges of the protein. These maps are especially useful in strongly ionic environments where screening effects play an important role. The selfconsistent ionic density calculation reflects an implied "ideal neutralizing ligand" from the electrostatic perspective. The contour levels are on a transferrable and realistic scale making protein comparisons possible. While the basis is a rather complex electrostatic model, the user need not be concerned with the details. What little parameteriztion there is was independently developed, validated with free energy of solvation experimental data and successfully applied to predicting proteinligand interactions. There is no need for the user to modify the parameters save for setting the energy levels for the contour plots. Contact PreferencesThe amount of available protein crystallographic data in the Protein Data Bank (PDB) is large enough that it can be subjected to a purely statistical analysis of nonbonded contacts. Such an analysis could then be used in a "predictive" manner by displaying known contact statistics from the PDB in the reference frame of novel structures. In this way, a scientist could visualize what contacts are common in the PDB and what contacts are rare. Thus, the "prediction" is merely a summary of known data rather than the results of a simulation. This is an important distinction from predictions that result from simulations. MOE contains a unique methodology [Labute 2000] for the determinination and display of the statistics of nonbonded contacts. The method is similar in spirit to the XSite program [Laskowski 1998], the SuperStar program [Nissink 2000] and the methodology of [Rantanen 2001] all of which are knowledgebased techniques (fitted distributions from crystallographic data). The purpose of the MOE Contact Preferences is to calculate, from the 3D atomic coordinates of a receptor, preferred locations for hydrophobic and hydrophilic ligand atoms. In particular, directional preferences of receptor atoms are taken into account and, moreover, hydrogen atoms and partial charges are not required for the calculation. Fundamentally, Contact Preference maps are contours of probability densities indicating a percentage likelihood of a protein nonbonded contact with a particular ligand atom type. Suppose that we are given a three dimensional molecular structure of a receptor made up of individual atoms each with a receptor atom type (to be defined later). We are interested in predicting the preferred positions of atoms in hypothetical bound partners in contact with the given structure. We assume that bound partners are made up of atoms each of a ligand atom type (to be defined later). Let S denote a random variable over molecular structures (assumed to be in a common frame of reference), X denote a random point over three dimensional space, T a random variable over the receptor atom types, and L a random variable over the ligand atom types. In this work we will consider methods to determine and apply the joint probability density
to the problem of the prediction of preferred ligand type locations. One can think of Eqn.(1) as the likelihood of observing a ligand atom of type l at location x in contact with a structure s. Note that this probability density includes the receptor atom types implicitly (through S). In particular, the conditional probability density
is the probability of observing a ligand atom of type l assuming that a ligand atom is located at position x in contact with structure s. If s is a protein structure then Eqn.(2) provides the means to compare the structure's preference for different ligand types at particular point in space; for example, with two ligand atom types "hydrophobic" and "hydrophilic", Eqn.(2) can be used to map the high probability hydrophobic and hydrophilic contact regions in a protein's active site since it is a probability distribution over a discrete collection of types (in this case, two). The values are percentages of likelihood of contact (e.g., 65% of the time there is a hydrophobic atom at that point in space). Since such an assessment is made for every point, it is natural to display the maps in the form of a 3D graphical contour surface at a given probability. Our intention is to approximate Eqn.(1) and Eqn.(2) for each atom in the receptor using onedimensional analytical functions whose parameters are estimated from a collection of Xray crystallographic structures (the PDB). Bayes Theorem is then used to put these one dimensional distributions into the form of Eqn.(2). We will choose atom types and functional forms intended to be useful for predicting hydrophobic and hydrophilic atom types. Contact Distributions. The first approximation is to assume that each structure s is composed of atoms each having a receptor atom type and that, further, the statistics gathered on each of the receptor atom types can be composed to approximate the distribution over the entire structure. More precisely, we assume that
where c is a normalization constant, i ranges over the atoms
of s, y_{i} are the coordinates of atom i in the
structure, and Out(x,l,s) denotes the condition that the atom type
l at location x is exterior to the structure s. The
Pr(L = l  T = t_{i}) term is estimated straightforwardly
using relative frequencies of contacts in a training set. Eqn.(3) is the
fundamental approximation that reduces the estimation of densities about
entire structures to the estimation of densities about particular primary
atom types t. In other words, we compose probability densities over
structures s from probability densities of the form
For a given primary atom located at x_{0} and an interacting atom located at x different from x_{0}, we take
where u and v are local coordinate directions (described below). The quantities can be thought of as the interaction distance r (Å), the lonepair interaction angle a (degrees) and the outofplane angle p (degrees). The vectors u and v in Eqn.(5) depend only on the hybridization (HYB) and heavy atom coordination (HAV) of the primary atom or its neighbors. The following is a schematic show how the u and v directions are defined (the blue arrows); the green atom is the receptor atom for to the directions the atoms apply and the white atoms are nearby bonded atoms. With sufficient training data, the threedimensional probability density of Eqn.(4) can be estimated with a histogram; however, for many type combinations there is insufficient structural data available to accurately estimate the required histogram. For this reason, we assume independence of r, a, and p and approximate Eqn.(4) with a product of three onedimensional densities:
where R denotes a random variable over the positive real numbers, A denotes a random variable over the interval [0,180] and P denotes a random variable over the interval [0,90]. Inspections of contact statistics revealed that the data were well described by a small number of independent analytical functional forms on r, a and p, the interaction distance, angle and outofplane angle, respectively. The collection of functional forms includes the gamma, log normal, Cauchy, beta and LennardJones densities. Each probability density has a small number of parameters which must be estimated from sampled structure data. The following procedure is used to estimate the parameters used in the approximation of Eqn.(3):
In this way, we have composed the joint density of Eqn.(1) with a collection of onedimensional probability densities of the form given in the righthand side of Eqn.(6) each of which has parameters estimated from Xray crystallographic data. Contact Criteria. When are two atoms in nonbonded contact? This is a surprisingly subtle question. Broadly speaking, two atoms are in contact if they are "close together" with no other atoms "in between" and are not in bonded contact. It is the quantification of these notions that is nontrivial, especially if no a priori radii are assumed for the atoms. After much experimentation and inspection of Xray crystal structures, we settled on the following definition of atomic nonbonded contact: Atom B is said to be in nonbonded contact with atom A if all of the following conditions are satisfied: a) the distance between A and B is less than 4.5 Å; b) the line joining A and B does not intersect any of the bonded neighbors of either A or B using a value of 1.0 for the radius of each neighbor; c) B is closer to A than to any other atom not related to B by a bond, angle or torsion; and d) A and B are not related by a bond, angle or torsion. The choices of 4.5 as distance cutoff and of 1.0 as radius in the line of sight test tend to work well together in that it is rare for a clear nonbonded contact to be excluded and for cases such as second layer carbonyl carbons to be included. Additionally, the derived distance histograms taper smoothly to zero within the cutoff. Atom Types. In order to estimate reasonably accurate analytical forms for the distribution of atom interactions, it was necessary to subdivide the atom pair interactions into classes. Both the primary atom and the ligand atom were classified into types, and for each combination of primary atom type and ligand atom type, separate statistics were gathered. In other words, we estimated a collection of probability densities each of the form Pr(R=r, A=a, P=p  T_{1}=I, T_{2}=J), the likelihood of observing (r,a,p) assuming that the primary atom is of type I and the ligand atom is of type J. After initial inspection of contact statistics in high resolution protein structures, we settled upon the collection of primary atom types defined in the following table (u indicates whether the atom has a u direction and v indicates whether the atom has a v direction in the local coordinate system):
Ligand atoms were divided into one of two classes: LPA and HYD. The LPA type consisted of nitrogen and oxygen atoms capable of participating in hydrogen bonds or metal interactions and the HYD type consisted of carbon and sulfur (hydrophobic atoms). The ligand atom types are largely invariant to protonation state and tautomer state. Training Data and Results. Nonbonded contact pairs of atoms were extracted from the Protein Data Bank as of December 31, 2004. Approximately 4,000 xray structures with resolution 2.0 Angstroms or better were used. Nonbonded contact pairs between backbone atoms and beta carbons were omitted since it was felt they would introduce a strong bias in the statistics. The majority of contact pairs involved sidechain to sidechain contacts. A typical set of statistics for an atom type is depicted below. The diagram shows the results for the type T_nQ1 which is a planar nitrogen with one heavy neighbor. The contact frequecies are displayed at the top in percent (i.e., 44% percent of the time a hydrophobic atom contacted the T_nQ1 atom and 56% of the time an LPA, or hydrophilic atom was in contact). The three plots show the probability densities for the (r,a,p) coordinates. The dotted lines show the raw histograms and the solid lines show the results of the analyitical fits. On the right there is a statement of the analytical functional forms and sketches of some chemical contexts of the T_nQ1 atom type. Also, there is pictorial representation of the Contact Preference map for the T_nQ1 atom at 95% preference level. Hydrophobic contacts are green (above the plane) and hydrophilic contacts are red (in the plane). Similar contact preferences are given below for a number of receptor atom types: One can clearly see that the methodology has captured the directional preferences of the atoms. Contact Preference maps of more complex molecules are composed from the individual atomic preferences. By way of example, below are depictions of the Contact Preferences (at ca. 95% probability) for a few protein side chains. As expected, in the above planar side chains, the hydrophobic contacts are above the pi system plane and the hydrophilic contacts are largely in the plane. Somewhat surprisingly, there is strong outofplane hydrophobic character for pi systems even on top of hetero atoms. The guanidinium group of ARG is a particulary nice example of this show the very strong hydrophobic character of the guandidinium group above the plane. The Contact Preference maps can be calculated for receptors (without ligands). Consider the Rolipram/PDE complex in the PDB 1RO6 entry. The following is a MOE calculated ProteinLigand schematic of key interactions. In the diagram below hydrophilic residues are purple, blue rings indicate basic groups, red rings indicate acidic groups, hydrophobic, residues are green, arrows are hydrogen bonds, blue discs on ligand atoms indicate solvent exposure when bound, light blue shadows indicate contact surface area):
The following two pictures are of the PDB 1RO6 proteinligand complex with the Rolipram ligand with Contact Preference maps displayed.
In the above right image, one can clearly see the successful prediction (from the receptor structure alone) the hydrophic preferences satisified by the benzene / ILE 410 and the cyclopentyl / PHE 446 interactions. Also, one can clearly see that Rolipram satisifies the important hydrophilic interactions with the carbonyl oxygen and HIS 234 in the upper right and the oxygens in the catechol group with GLN 443 in the lower left. The Contact Preference distributions are also useful for color coding molecular surfaces. Color coding a surface by underlying nearest receptor atom type often hides the directional preferences of those atoms. We can use the analytic distributions to smoothly color a molecular surface to encode these receptor directional preferences. On the right, one can see the 1RO6 complex with a reentrant molecular surface. The color coding of the surface was performed without using the rolipram ligand. Green regions are hydrophobic, purple are hydrophilic and the blue regions are mildly polar inpiplane contacts (e.g., with the hydrogens of benzene). A closer examination reveals the good agreement with the purple region close to the catechol group and the green hydrophobic regions above the plane of the benzene ring and around the cyclopentyl group (all in the lower left). One can also see the agreement with the T_OQ2 contact statistics in that this type has hydrophobic character out of its bonding plane. It is also not difficult to see that both the receptor and ligand's mildly polar regions (the inpiplane aromatic) are exposed to solvent and if one were to include the ligand in the surface coloring, the blue regions of both the ligand and receptor would be exposed to the solvent. It is important to note that since the same contact distributions are used, statistical contact information is consistently displayed whether encoded in surface coloring or rendered as volumetric regions in the Contact Preference maps. This is important since active sites are often crowded and difficult to visualize in 3D. The surface color coding captures many of the interaction types while leaving the active site volume available for further visualizations. It should be stressed that the Contact Prefence calculations, whether for surface encodings or for volumetric rendering, are fast. It requires only a few seconds to calculate and display the maps. Because of this, they need not be stored  they can be calculated on demand  whether for one receptor or a family of receptors (for selectivity studies). Finally, although the methodology is sophisticated, users need not be concerned with the precise details. Rather, the results can be summarized as a percent likelihood of a particular hydrophobic or hydrophilic region derived from PDB nonbonded contact statistics. There are no tuning parameters to set, protons need not be present and the results are on a consistent scale independent of the receptor. SummaryWe have presented a number of methods for predicting favorable locations of ligand features in protein active sites:
Each of the calculations can be effected in a few seconds and the results displayed with commonly available 3D graphics capabilities of most computers. Although the underlying methodology is complex, users need not be concerned with the precise details nor need they be concerned with the setting of parameters. In MOE these capabilities are integrated into a single application called Surfaces and Maps with three different pages, or views (allowing users to quickly select the surface mapping method of interest):
After specifying the receptor atoms and extent of the plot, users need only press OK and then manipulate the contour level sliders to visualize the results. Consequently, such maps can be routinely used by all scientists. References
