Journal Articles

Locating Binding Sites in Protein Structures

Paul Labute and Martin Santavy
Chemical Computing Group Inc.
 

Introduction | Methodology | Results | Summary | References
 

Introduction

Active sites in proteins are usually hydrophobic pockets that involve sidechain atoms. It is non-trivial to quantify this rule of thumb. Methods such as [Goodford 1985] or [Miranker 1991] use interaction energies between the receptor and different probes in an attempt to locate energetically favorable sites. Unfortunately, this procedure requires the assignment of proton locations and partial charges to the receptor atoms and this is not always easy. While van der Waals energies can indicate sterically available regions, the long-range nature of electrostatic potentials make the interpretation of energy levels difficult (e.g., a carboxylate in an active site will emphasize positively charged probes even though negatively charge probes like carbonyl oxygen may be part of the bound ligand). Alternatively, purely geometric methods seek to locate "pockets" without the use of energy models. This is advantageous since proton locations are not required. LigSite [Hendlich 1997] uses a grid representation of the molecular volume and computes exterior site scores by projecting rays from the receptor exterior to the surface. The deeper and more surrounded a site is, the higher it scores. [DelCarpio 1993] uses an analytical geometric algorithm to compute pocket sites.

The purpose of the Site Finder application in MOE is to calculate possible active sites in a receptor from the 3D atomic coordinates of the receptor. Such a calculation is useful for suggesting possible mutations for site-directed mutagenesis experiments or to help determine potential sites for ligand binding in docking calculations or MultiFragment Search.

The MOE Site Finder described in this document falls into the category of geometric methods since no energy models are used. Instead, the relative positions and accessibility of the receptor atoms are considered along with a rough classification of chemical type. The fundamental methodology principles are:

  1. Identify regions of tight atomic packing. This is not the same as locating pockets, since surface sites may still be regions of tight packing.

  2. Filter out sites that are "too exposed" to solvent. In other words, sites that are on protrusions are unlikely to be good active sites.

  3. Use hydrophobic/hydrophilic classifications. This coarse classification of chemical type is used to separate water sites from the more likely hydrophobic sites.

  4. Use a definition of hydrophilic that is invariant to protonation state and tautomer state (this means no distinction between donor and acceptor atoms). Use a definition of hydrophobic that is invariant to tautomer state (this means that aromaticity cannot be used).

  5. Avoid grid-based methods since grid methods are not invariant to rotation of the atomic coordinates and can consume large amounts of memory.

Methodology

Alpha Spheres. The MOE Site Finder methodology is based on alpha spheres, which are closely related to the alpha shape of a set of weighted vertices, a concept from computational geometry that has found its use in computational chemistry [Edelsbrunner 1995], [Liang 1998]. A sphere that contacts some atoms on its boundary and contains no internal atoms is called a contact sphere. An alpha sphere is a special case of a contact sphere:

An alpha sphere is a sphere that contacts 4 atoms on its boundary and contains no internal atoms.

Note that a sphere that contacts 4 atoms is unique: there is no other sphere that contacts the same four atoms. A special case of an alpha sphere is an empty half-space (a half of space separated by a plane from the other half) that contacts 3 atoms. We consider such a half-space to be an alpha sphere of infinite radius with infinity as the fourth contacting point.

A collection of all alpha spheres derived from atoms of a protein describes well the shape of the protein. Very small spheres identify the interior of the protein, larger spheres correspond to the exterior. Medium spheres are found in tightly packed pockets and cavities of the protein, which are potential candidates for the location of an active site.

Figure 1a shows an alpha sphere in 2D that contacts three atoms of different radii. (Alpha spheres in 2D will contact 3, not 4 atoms.) Figure 1b shows one small and two larger alpha spheres that belong to the interior and the exterior, respectively, of a molecule in 2D. The smaller exterior sphere also identifies the location of a tightly packed pocket.

Figure 1aFigure 1b

A Voronoi region of an atom is the set of all points that are closer to that atom than to any other atom. The Voronoi regions of all atoms form the Voronoi decomposition of space. In 3D, the point at which four Voronoi regions intersect is called a Voronoi vertex. The Voronoi region of an atom coincides with the region of centers of all possible contact spheres that contact the atom. The Voronoi vertices thus coincide with the centers of alpha spheres, and the problem of finding alpha spheres can therefore be transformed into the problem of finding the Voronoi decomposition of space, given a set of atoms.

The notion of "closeness" and "contact" and therefore of the shape of the Voronoi regions is subject to our definition of distance between a point and a sphere representing an atom. The most intuitive definition, that of the "closest point" distance, defines it as the distance to the closest point on the sphere:

distance(pointA, sphereB) = distance(pointA, centerB) - radiusB

Such a definition leads to Voronoi regions with curved boundaries that are computationaly difficult to obtain. We use less intuitive, but mathematically simpler "perpendicular" distance definition, which defines the distance to a sphere as the distance to the intersection of the sphere with a tangent to it:

distance2(pointA, sphereB) = distance2(pointA, centerB) - radius2B

This modification leads to computationally more tractable Voronoi regions with linear boundaries, while not significantly affecting the positions of the resulting alpha spheres.

Figure 2a shows the Voronoi decomposition of a set of zero-radii atoms in 2D. The atoms are shown as full green circles. The boundaries of the Voronoi regions are shown as purple lines. The Voronoi vertices are shown as small purple discs. (Voronoi vertices in 2D are intersections of just 3, not 4, Voronoi regions.) The large thin red circle shows one of the alpha spheres. Figure 2b shows the boundary of Voronoi regions of two atoms in 2D space, A and B. The curved purple line is the the Voronoi boundary under the closest-point distance metric. The straight purple line is the boundary under the perpendicular metric. The thick green and blue lines illustrate the property of a point on the boundary of being at the same distance from both A and B.

Figure 2aFigure 2b

The problem of finding the Voronoi decomposition (and Voronoi vertices) of a set of atoms is equivalent to the "dual" problem of finding the Delaunay triangulation, which is a graph of atoms in which atoms are connected by an edge whenever their Voronoi regions share a common boundary. If the distance metric used is the "perpendicular" distance, a Delaunay triangulation in 3D can be converted to the much simpler problem of finding a convex hull in 4D space. (This is accomplished by projecting the 3D points onto the surface of a 4D paraboloid.) A convex hull of a set of atoms is the intersection of all half-spaces that contain the atoms.

Figure 3a shows the Delaunay triangulation of a set of zero-radii atoms in 2D. Figure 3b shows the convex hull of the same set of atoms.

Figure 3aFigure 3b

Known techniques exist to find convex hulls in multidimensional spaces. These techniques hinge on the ability to determine on which side of a given plane lies a given point without incurring numerical rounding errors. Some implementations use special, infinite-precision numerical libraries that may require significant computational resources. MOE uses a much faster method, based on Clarkson's extension [Clarkson 1992] of the well-known Gram-Schmidt orthogonalization procedure, which allows us to calculate the collection of alpha spheres in seconds instead of minutes. The downside is an upper limit on the theoretical resolution of the input data -- this does not, however, present a problem when working at the resolutions needed for molecular data.

Site Finder. The MOE Site Finder first calculates a collection of all alpha spheres. The collection of alpha spheres is pruned by eliminating those that correspond to inaccessible regions of the receptor as well as those that are too exposed to solvent. Only the spheres that correspond to locations of tight atomic packing in the receptor are retained.

Next, each alpha sphere is classified as either "hydrophobic" or "hydrophilic" depending on whether the sphere is in a good hydrogen bonding spot in the receptor. Hydrophilic spheres not near a hydrophobic sphere are eliminated (since these generally correspond to water sites). Near duplicates, i.e. spheres with centers very close to that of another sphere, are also eliminated. The alpha spheres are then clustered using a single-linkage clustering algorithm to produce a collection of sites. Each site consists of several alpha spheres at least one of which is hydrophobic. Only clusters above a minimum size, both in terms of the number of spheres and the geometric extent, are retained.

Finally, the sites are ranked according to the number of hydrophobic contacts made with the receptor, i.e. the number of hydrophobic atoms within a contact distance of any of the retained alpha spheres. The following is a depiction of the graphical user interface of the MOE Site Finder:

The individual sites are reported in a scrolled list. For each site, the size (in receptor atoms), the number of hydrophobic contacts and the number of sidechain contacts are reported along with a list of the residues containing the contacting atoms. For example, site number 4 (selected in the panel) consists of 24 receptor atoms, 4 of which are hydrophobic and 9 of which are sidechain atoms. The atoms belong to the residues SER93, GLY94, etc. in chain 1. Individual sites may be highlighted and rendered in the 3D rendering area as alpha spheres optionally augmented with atom and backbone renderings.

Results

The following images are examples of using the MOE Site Finder on various complexes from the Protein Data Bank. Except where noted, the Site Finder was run on the receptor alone, and the site that scored highest (according to the hydrophobic contact count) is displayed and overlaid with the complexed ligand.


Left PDB 1AAQ (HIV-1 Protease) and the first site located by the MOE Site Finder. Middle 1AAQ with the complexed ligand (hydroxyethylene isostere). Right Hydroethylene isostere overlaid with calculated alpha spheres of the first site.


Left PDB 1BZM (carbonic anhydrase I) and the first site located by the MOE Site Finder. Middle 1BZM with the complexed sulfonamide ligand. Right Sulfonamide ligand overlaid with calculated alpha spheres of the first site.


Left PDB 1PPH (trypsin) and the fifth site located by the MOE Site Finder. Middle 1PPH with the complexed ligand (3-TAPAP). Right 3-TAPAP ligand overlaid with calculated alpha spheres of the fifth site.


Left PDB 3DFR (dihydrofolate reductase) and the first site located by the MOE Site Finder. Middle 3DFR with complexed ligands (NADPH and methotrexate). Right NADPH and methotrexate ligands overlaid with calculated alpha spheres of the first site.


Left PDB 3RTD (streptavidin) and the first site located by the MOE Site Finder. Middle 3RTD with complexed ligand (biotin). Right Biotin ligand overlaid with calculated alpha spheres of the first site.

Summary

We have presented the basic methodology of the MOE Site Finder along with several examples of its use on PDB complex data. In general, the Site Finder ranks the correct active site in the top few (when ranked by hydrophobic contacts). Because no grids are used the method is efficient in terms of both memory use and CPU use (typically only several seconds are needed to calculate the sites). The MOE Site Finder was written in the SVL programming language and is available in MOE 2001.01.

References

[Clarkson 1992] Clarkson, K.L. Safe and Effective Determinant Evaluation. In Proc. 31st Annual IEEE Symposium on Foundations of Computer Science. 387-395 (1992).
[DelCarpio 1993] Del Carpio, C.A., Takahashi, Y., Sasaki, S. A New Approach to the Automatic Identification of Candidates for Ligand Receptor Sites in Proteins: (I) Search for Pocket Regions. J. Mol. Graph. 11, 23-29 (1993).
[Edelsbrunner 1995] Edelsbrunner, H., Facello, M., Fu, R., Liang, J. Measuring Proteins and Voids in Proteins. Proceedings of the 28th Annual Hawaii International Conference on Systems Science. 256-264 (1995).
[Goodford 1985] Goodford, P.J. A Computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules. J. Med. Chem. 28, No. 7, 849-857 (1985).
[Hendlich 1997] Hendlich, M., Rippman, F., Barnickel, G. J. Mol. Graph. 15, 359-363 (1997).
[Liang 1998] Liang, J., Edelsbrunner, H., and Woodward, C. Anatomy of Protein Pockets and Cavities: Measurement of Binding Site Geometry and Implications for Ligand Design. Protein Science. 7, 1884-1897 (1998).
[Miranker 1991] Miranker, A., Karplus, M. Functionality Maps of Binding Sites: A Multiple Copy Simultaneous Search Method. Proteins: Structure, Function, and Genetics. 11, 29-34 (1991).

Further information on alpha shapes can be found, for example, on the "CastP" web server of the University of Illinois at Urbana-Champaign: http://sts.bioengr.uic.edu/castp/.