Journal Articles

Rotamer Exploration and Prediction

Mike Soss
Chemical Computing Group Inc.
 

Introduction  | Methodology  |  Results  |  Summary  |  References
 

Introduction

Accurate prediction of amino acid side-chain conformation is an important aspect of protein structure prediction and has application in comparative modeling of proteins, amino acid mutation studies and studies of side-chain re-arrangement in protein-ligand complexes. Contemporary methodologies such as those of Bower, Cohen, and Dunbrack [Bower 1997], Xiang and Honig [Xiang 2001], or Liang and Grishin [Liang 2002] employ a backbone-dependent rotamer library that serves as the set of candidate conformations for side-chains. For given phi-psi angles of the backbone of the residue, the rotamer library contains the rotamers which have been observed in some collection of crystal structures, along with the frequency at which they are observed. To predict the side-chain conformation of a residue, each rotamer is scored based on various characteristics (such as contact surface and volume, electrostatics, desolvation energy) as well as the a priori probability of seeing each rotamer as determined by its frequency in the library.

The central assumption for using such rotamer libraries is that there is enough data in the PDB for each side-chain conformation to attempt statistical determination of conformation preferences lessen the need for detailed energy-based methods (such as force fields). However,

  1. In rotamer libraries, the a priori probabilities (frequencies) for the occurrence of each rotamer is sampled from crystallographic data. Although these probabilities have been shown to aid scoring functions [Liang 2002], it is unclear whether these frequencies are applicable if the sidechain conformation is to be predicted during the presence of other molecules, such as solvent, a nearby complexed ligand, or the side chains of other amino acids.

  2. Xiang and Honig [Xiang 2001] have shown that conformation prediction accuracy increases with the size of the rotamer library. This is likely due to the fact that a side-chain's conformation is significantly influenced by factors other than the local backbone configuration.

These observations tend to undermine the fundamental assumptions for the use of rotamer libraries. With an infinite training set, every sterically reasonable conformation would be represented and it would be the particular environment of the side-chain that would determine its conformation (not a priori probabilities based upon the backbone alone). In this article we present a methodology for predicting side-chain conformation based upon systematic conformational search of side-chains followed by the application of an environment-based scoring function to rank the conformers. We outline the basic methodology and present some examples of its use.

Methodology

Rotamers are generated by a systematic conformational search. An estimate is obtained for the change in free energy of solvation between the state where the protein (without rotamer) and the rotamer are in isolation, and the state where they form a complex. This is done by a scoring function that takes into account the changes in the types of surface area contacts. (For example, hydrogen-bonding surfaces inside a hydrophobic area would be one type.) This change in the free energy term is added to the torsional strain energy of the rotamer to compute the deltaG term.

Generation of Rotamers. A systematic conformational search is used to generate side-chain conformations of a particular amino-acid. The systematic conformational search generates side-chain conformations by systematically rotating bonds in a side-chain by discrete increments. The purpose of the search is to generate a collection of reasonable side-chain conformations, which may or may not be at local minima. A generated conformation is rejected if it contains an atom that is involved in a steric clash with another atom of the side-chain or with any atom in its environment (the rest of the protein including solvent and complexed ligands). Two atoms clash if the distance between them is less than 0.6 times the sum of their contact radii (derived from analysis of non-bonded contacts in crystallographic data). Conformations generated by this method may be strained due to bonded interactions as well as some non-bonded strain.

Each rotatable bond is sampled at multiple dihedral angles, and the bond type determines the angles at which the bond is sampled. This is illustrated in figure 1:

Bond Type Sampled Dihedral Angles
O/S -../td> {90, -90}, plus or minus 5 degrees
sp3 - sp3 {60, 180, -60}, plus or minus 5 degrees
all others All multiples of 15 degrees

Figure 1. Sampled dihedral angles

The intent is to sample the main a priori minima along with some internally strained conformations. The sampled dihedral angles were determined by inspection of various rotamer libraries and side-chain conformations derived from crystal structures. Experimentation revealed that these angles (and the 0.6 multiplier on contact radii) produced conformations close to crystal structures in all cases examined.

The strain energy, U, of a conformation is modeled as the torsional strain of the sp3-sp3 rotatable bonds (only) using the AMBER force field parameters. For amino acid side chains this reduces to only two parameters C-C rotatable bonds and C-S rotatable bonds. It is this torsional strain term that is the only model of the a priori rotamer probability; i.e., we take this probability to be proportional to exp(-U/kT).

Classification of Atoms and Surfaces. The rotamers will ultimately be scored by deltaG, the change in free energy of solvation, between the states where the sidechain is in isolation, and when it is in a complex with the environment. We use the term environment to mean the surrounding atoms which interact with the sidechain. Often this will simply be the rest of the protein, but also may include nearby ligands and such.

Each atom in the rotamer and the environment is classified as either: hydrogen-bonding (donor), hydrogen-bonding (acceptor), hydrogen-bonding (both), hydrophobic, or polar. Our method is novel in the sense that our typing does not end at the atom level, but rather, we type each point inside and on the surface of the atom. Each point is said to have three parameters, Chb, Cphob, and Cpol, which may take on values between 0 and 1:

  • hydrogen-bonding characteristic (Chb)
  • hydrophobic characteristic (Cphob)
  • polar characteristic (Cpol)

For all points, these three characteristics sum to one, thus a point can be said to be, for example, 0% hydrogen-bonding, 20% hydrophobic, and 80% polar. The method of typing points depends on the atom type to which it belongs and whether or not the atom belongs to a pi-system.

  • An atom not in a pi-system, i.e., a spherically symmetric atom, is typed as either hydrogen-bonding or hydrophobic. A point in the atom gains only the characteristic of the same type as the atom to which it belongs. All points share identical characteristics.

    Hydrogen-bonding Hydrophobic
    hydrogen-bonding characteristic (Chb) 1 0
    hydrophobic characteristic (Cphob) 0 1
    polar characteristic (Cpol) 0 0

    Figure 2. Characteristics of points within non-pi-system atoms.

  • An atom contained in a pi-system is typed as either hydrogen-bonding or polar. Since pi-system atoms have directional preferences for different environments, the preferences of the points of the atom are modeled separately. For example, although one may classify the carbons in the pi-system of tyrosine as polar, it is more accurate to classify its in-plane region as having a high polar character and its out-of-plane region as having a high hydrophobic character. This behavior is illustrated below in figure 3. Examples of this phenomenon are shown in figure 4, where the green regions are areas with a high statistical preference for hydrophobic contacts, and the red regions have a high preference for hydrogen-bonding contacts.

    in-plane and out-of-plane preferences

    Figure 3. Directional preferences of sp2 atoms





    The pi-system of a tyrosine residue. The out-of-plane region prefers hydrophobic (green) contacts, whereas the in-plane region prefers hydrogen-bonding (red) contacts.


    The beta carbon of alanine (non-pi-system atom). The green region indicates the fairly symmetric preference for hydrophobes around the atom.  

    Figure 4. Examples of directional preferences

    It is possible to give a quantitative description of the distributional preferences of these atoms by recording the distributions of hydrophobic contacts, and that of hydrogen-bonding contacts, with respect to elevation from the sp2 plane. (The acquisition of such statistics is described by Labute [Labute 2001].) These distributions can be well-fit by the following simple distributions:

    • For a point in a pi-system atom with elevation r degrees from the plane of the pi-system, the a priori distribution of hydrophobic contacts are as follows:

      Plot of hydrophobic contact distribution versus elevation

      Figure 5. The white line is the distribution of hydrogen-bonding contacts. The green line is the fit Gamma distribution.

      This can be fit with a Gamma distribution with mean mu = 60 degrees and variance sigma = 21 degrees, leading to the following distribution dphob(r):

      Equations for d_phob(r)

    • For a point in a pi-system atom with elevation r degrees from the plane of the pi-system, the a priori distribution of hydrogen-bonding contacts are as follows:

      Plot of H-bonding contact distribution versus elevation

      Figure 6. The white line is the distribution of hydrogen-bonding contacts. The green line is the fit exponential distribution.

      This can be fit with an exponential distribution with a mean of 25 degrees, leading to the distribution dhb(r) = e-r/25 / 25.

    For any point, the two functions are compared and a preference ratio is computed for the two environments. Specifically, we assign preferences for points at elevation r of dhb(r) / [dhb(r) + dphob(r)] for hydrogen-bonding contacts, and dphob(r) / [dhb(r) + dphob(r)] for hydrophobic contacts. Note that the sum of the two preferences is one.

    We then assign characteristics to the points based on the type of the atom to which it belongs.

    Hydrogen-bonding atoms:

    hydrogen-bonding sp2 atom

    Chb(s) dhb(r) / [dhb(r) + dphob(r)]
    Cphob(s) dphob(r) / [dhb(r) + dphob(r)]
    Cpol(s) 0
    Polar atoms:

    polar sp2 atom

    Chb(s) 0
    Cphob(s) dphob(r) / [dhb(r) + dphob(r)]
    Cpol(s) dhb(r) / [dhb(r) + dphob(r)]

    Figure 7. Characteristics of points within pi-system atoms.

     

    dphob(r) / [dhb(r) + dphob(r)]  

     

     

    elevation in degrees (r)

     

    Figure 8. Plot of hydrophobic preference versus elevation

We have now classified all points inside atoms and are able to analyze the contacts between them.

Analyzing Surface Contacts. Two atoms are in contact when their water-accessible surfaces intersect. That is, the contact radii of each atom is expanded by 1.4 angstroms, and any intersection thereof is considered to be a contact, with the exception of 1-2 and 1-3 interactions. Each contact is analyzed by considering each atom surface point and the environment in which it is buried. For each such surface point, its characteristics are multiplied by the characteristics of the atom point with which it forms the contact. These contacts are classified into one of the following seven types based on the two characteristics involved:

  • hydrogen-bonding/hydrogen-bonding, favorable interaction (donor/acceptor, donor/both, acceptor/both, or both/both)
  • hydrogen-bonding/hydrogen-bonding, unfavorable interaction (donor/donor, or acceptor/acceptor)
  • hydrophobic/hydrophobic
  • polar/polar
  • hydrogen-bonding/hydrophobic
  • hydrogen-bonding/polar
  • hydrophobic/polar

This is most easily illustrated by examples. First let us consider the case of two non-pi-system atoms, one hydrogen-bonding and one hydrophobic, shown in figure 9.



Figure 9. Non-pi-system hydrogen-bonding atom in contact with a non-pi-system hydrophobic atom

With respect to the hydrogen-bonding atom, the point s has characteristics Chb = 1, and Cphob = Cpol = 0. With respect to the hydrophobe in which it is buried, the point s has characteristics Cphob = 1, and Chb = Cpol = 0. The contact is classified as the product of these characteristics:

  • hydrogen-bonding/hydrophobic:
          (Chb(s) w.r.t. hydrogen-bonding atom) (Cphob(s) w.r.t. hydrophobe) +
          (Cphob(s) w.r.t. hydrogen-bonding atom) (Chb(s) w.r.t. hydrophobe) =
          (1)(1) + (0)(0) = 1

As expected, the contact is classified as having characteristic of 1 for hydrogen-bonding/hydrophobe contact. Since neither atom is polar, we would expect that there is no polar/hydrophobe contact in figure 9.

  • hydrophobic/polar:
          (Cphob(s) w.r.t. hydrogen-bonding atom) (Cpol(s) w.r.t. hydrophobe) +
          (Cpol(s) w.r.t. hydrogen-bonding atom) (Cphob(s) w.r.t. hydrophobe) =
          (0)(0) + (0)(1) = 0

To illustrate how the system behaves with pi-system atoms, consider the contact at the point s in figure 10, where s is a point on the surface of a pi-system polar atom which is in contact with a non-pi-system hydrophobic atom.



Figure 10. Pi-system polar atom in contact with a non-pi-system hydrophobic atom

The point s, with respect to the polar atom to which it belongs, has a characteristics Cphob = dphob(r) / [dhb(r) + dphob(r)] and Cpol = dhb(r) / [dhb(r) + dphob(r)]. With respect to the polar atom to which it belongs, the point s has characteristics as described in figure 7. Let us assume that Chb(s) = 0, Cphob(s) = 0.7, and Cpol(s) = 0.3. Inside the non-pi-system hydrophobe, each point has characteristic Chb = 0, Cphob = 0.7, and Cphob = 1. We compute the characteristic of the contact as the product of the characteristics of s with respect to the polar atom to which it belongs, and the characteristics of s with respect to the atom in which it is buried. Therefore:

  • hydrophobic/hydrophobic:
          (Cphob(s) w.r.t. polar atom) (Cphob(s) w.r.t. hydrophobe) = (0.7)(1) = 0.7

  • hydrophobic/polar:
          (Cpol(s) w.r.t. polar atom) (Cphob(s) w.r.t. hydrophobe) +
          (Cphob(s) w.r.t. polar atom) (Cpol(s) w.r.t. hydrophobe) =
          (0.3)(1) + (0.7)(0) = 0.3

The surface contact types are computed for each point s on the pi-system atom surface inside the spherical atom. The point s has a characteristic based on its latitude with respect to the center of the pi-system atom. Since the hydrophobic atom in our example is spherical, all points inside it have characteristic Chb = 0, Cphobic = 1, Cpolar = 0.

The weighted surface area for each type of contact is computed by integrating the contact characteristics over the surfaces. For example, the total amount of polar/polar contact is computed by

(Cpol (s) w.r.t. atom whose surface contains s) (Cpol (s) w.r.t. atom in which s is buried) ds.

Note that because the sum of all characteristics sum to 1, the total contact surface area is the sum of the amounts of all contact types. Thus, this scheme is simply a weighting of the various types of contacts.

Two special cases require additional comment. Firstly, for the case of two hydrogen-bonding characters interacting, the resulting area is classified based on whether the interaction is attractive or repulsive as decided by the atom types. Secondly, any exposed surface, i.e., not in contact with another atom, is assumed to be in contact with solvent, which has fully spherical hydrogen-bonding (both donor and acceptor) character.

Estimating deltaG. The Rotamer Explorer estimates deltaG, the change in free energy of solvation between two states: protein without sidechain (receptor) and sidechain each in isolation, and the complex of the two.

For each state, the seven contact types for all contacts between atoms, and between atoms and solvent, are computed. This provides seven descriptors, each one being the change in contact surface area between protein and sidechain each in isolation, and the complex.

These descriptors were used to model deltaG with a regression fit using a database of small molecules and their solvation energies. Not all of the descriptors are well-represented by the data therein; some coefficients were determined by hand and trial-and-error. The fit (in kcal/mol per square angstrom of weighted surface area) was found to be the sum of the following terms:

  • -0.049 hydrogen-bonding/polar
  • -0.154 hydrogen-bonding/hydrogen-bonding (favorable)
  • -0.052 hydrogen-bonding/hydrogen-bonding (unfavorable)
  • +0.004 hydrogen-bonding/hydrophobic
In addition to the change in free energy of solvation, the term for the torsional strain energy of the rotamer discussed above in the Generation of Rotamers section was added.

Although the Rotamer Explorer also reports the change in hydrophobic surface area (the integral of hydrophobic character over all exposed surface) and explicit hydrogen bonds, these do not factor into the deltaG scoring function. They are provided for information purposes only.

Results

The following images are examples of this method of predicting conformations of a few long sidechains of PDB protein 1IC6.A, chosen for its size and high resolution. In each of the following figures, the native conformation is shown colored by element. In the left image, the predicted rotamer (the rotamer with the lowest deltaG) is shown in white. In the right image, all other rotamers generated by the conformational search are shown.

Figure 11. ASP_187

Figure 12. GLN_149

Figure 13. HIS_229

Figure 14. ILE_252

Figure 15. LYS_94

Summary

We have presented the basic methodology of the MOE Rotamer Explorer along with examples of its use on PDB protein structures. In general, the Rotamer Explorer ranks a rotamer close to the native conformation (when ranked by deltaG) in the top few.

Not only is the generation of rotamers unbiased by a library, but the scoring method employed by the MOE Rotamer Explorer is also intuitive. Our scoring function produces a deltaG term from a regression fit of few descriptors, computed from the surface contacts and the torsional strain of the rotamer. Its advantage over several other methods is that instead of classifying each atom as being either hydrophobic or hydrophilic, it also considers the directional preferences.

Even with the simplicity of the method, the preliminary results indicate that our system performs well for the case of a single residue. Our next step for research will be predicting several side-chain conformations simultaneously. The MOE Rotamer Explorer was written in the SVL programming language and is available in MOE 2002.03.

References

[Bower 2001] Bower, M., Cohen, F., and Dunbrack, R. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. Journal of Molecular Biology, 267:1268-1282, 1997.
[Labute 2001] Labute, P. Receptor Contact Preferences. Journal of the Chemical Computing Group, http://www.chemcomp.com/feature/cstat.htm.
[Liang 2002] Liang, S. and Grishin, N. Side-chain modeling with an optimized scoring function. Protein Science, 11:322-331, 2002.
[Xiang 2001] Xiang, Z. and Honig, B. Extending the accuracy limits of prediction for sidechain conformations. Journal of Molecular Biology, 311:421-430, 2001.