Journal Articles

Flexible Alignment Of Small Molecules

Paul Labute
Chemical Computing Group Inc.
1010 Sherbrooke Street W, Suite 910; Montreal, Quebec; Canada H3A 2R7

Abstract. A method is presented for flexibly aligning small molecules. The method accepts, as input, a collection of small molecules with 3D coordinates and computes a collection of alignments. Each alignment is given a score which quantifies the quality of the alignment both in terms of internal strain and overlap of molecular features. The results of several computational experiments on collections of compounds are reported suggesting the method's utility for the elucidation of pharmacophores, comparative field analysis and template forcing.



Given n molecules known to bind with a particular receptor it is often of interest to determine in what ways they are similar. The notion of "molecular similarity" is difficult to quantify. In many methods, a set of features of interest of a conformation in a molecule is compared to the feature set of another molecule in some quantitative manner. For example, matrices of inter-feature distances can be compared in some matrix metric. Another possibility is to combine comparison of feature type and feature distance into some ad hoc similarity measure. Such methods typically require empirical fine-tuning of various constants in order to assign relative importance to particular feature comparisons.

In this paper, we shall consider a general framework for feature comparison based on the notion of a feature probability density of a given conformation of a molecule. The advantage of probability densities is that they must integrate to unity and are, therefore, inherently balanced with respect to each other (i.e., probability is an absolute measure). As an example of the general method, we defined a quantitative measure of the quality of a 3D alignment (or configuration) of molecules inspired by the following qualitative notion of a good alignment:

An alignment is good if a) the strain energy of each molecule is small; b) they have a similar shape; c) their aromatic atoms overlap; d) their donors overlap and e) their acceptors overlap.

The derived quantitative assessment was used in a simultaneous conformational search procedure. Details of the methods used are presented in the section entitled Methods and the results of several computational experiments are presented in the Results and Discussions section.


Similarity Measure. An isotropic (spherically symmetric) Gaussian, or Normal, probability density has the functional form

where s2 is the inverse variance along each axis and x0 is the center (and mean) of the density. An atom with nucleus located at x0 with van der Waals radius r is represented with

where a determines how broad the density will be. For example, if a = 2 then approximately 90% of the density will be contained within the van der Waals radius. In general, the value of a can be used to approximate molecular volumes; i.e., to fill in the gaps between atoms.1

Let x1,...,xn denote the 3D positions of n atoms of a molecule in a given conformation. Let the non-negative wi denote the degree to which atom i has some property P (e.g., "is aromatic" or "is donor"). We can then assign, to each point in space, x, the likelihood that x has the property with sum-of-Gaussians density

This density is called the P-density of the conformation. For example, if P is the property "is aromatic" so that wi is 1 when atom i is aromatic and 0 otherwise then the function f is the aromatic density of the conformation and represents the likelihood that a given point in space is (in some sense) aromatic.

Given two P-densities for two given molecules or conformations, the overlap of the two densities is, itself, a sum-of-Gaussians density in the interatomic distances:

Using these notions, we can define a feature similarity measure, F, to be

where each term is a sum of overlaps of P-densities between all pairs of molecules under consideration for the properties "aromatic", "donor", "acceptor" and "volume".

Feature Assignment. The P-density similarity measure requires the assignment of atomic "features". In this study, the following features are used:

  1. Volume. A volume feature is just the presence of an atom. In the P-density, this corresponds to using a value of 1 for each wi.

  2. Aromatic. The aromatic P-density is constructed by assigning wi = 1 if atom i is aromatic and 0 otherwise. In this study, the Huckel 4n+2 rule was used to assign aromaticity.

  3. Donor. The donor P-density was assigned by setting wi = 1 if atom i was of type "Donor", "Polar" or "Basic" under a pharmacophore atom typing scheme (see below).

  4. Acceptor. The acceptor P-density was assigned by setting wi = 1 if atom i was of type "Acceptor", "Polar" or "Acidic" under a pharmacophore atom typing scheme (see below).

A pharmacophore-based scheme was used to assign atom types to each atom. The atom type set consisted of "Donor", "Acceptor", "Polar", "Acidic", "Basic", "Hydrophobe", and "Other" and a molecular pattern matching algorithm was used to make the assignments and the rule set was based on the PATTY rules.2

Search Procedure. With a quantitative measure of goodness for an alignment, we can "sample" an alignment using a RIPS3 style procedure:

  1. [Perturb] Randomly rotate all rotatable bonds. Add a random number to all atomic coordinates. Superpose all molecules by randomly choosing three atoms from each.

  2. [Optimize] Minimize the similarity function -kT log F + U with respect to the coordinates of all of the atoms. Here, U is the average potential energy of the molecules.

  3. [Compare] If the new configuration has not been seen before (RMSD greater than some threshold) then set k = 0 otherwise set k = k + 1. If k is greater than some predefined amount (20 in this present study) then terminate the search.

The termination criteria can be interpreted as follows. Upon termination there have been k consecutive attempts to generate a new configuration and each has failed. This is similar to tossing a coin k times and observing "heads" on each toss. The Bayes estimator for the bias on the coin is (k+1)/(k+2); thus the probability of observing a "tail" (or a new configuration) is 1/(k+2). At k = 20 this probability is less than 5%.

Upon termination of the search the collection of configurations is pruned by removing all configurations in which the average potential energy is greater than the minimum observed average potential energy plus some predefined threshold (in this study, 7 kcal/mol).

Materials and Software. The foregoing techniques were implemented in the SVL programming language of Chemical Computing Group Inc.'s Molecular Operating Environment4 (MOE) version 1998.10. All calculations were performed on a Silicon Graphics Octane with a 250 MHz MIPS chip running IRIX 6.2. Nonlinear optimization was carried out with the MOE Truncated Newton optimizer preceded by two steps of Steepest Descent5 and terminated when the RMS gradient fell below 0.001. The MOE implementation of MMFF946 was used to measure the internal strain of each molecule. Chirality was preserved using signed volume restraints on all chiral centers. Pharmacophore atom type assignment was performed using the MOE implementation of the Daylight SMARTS pattern matching language. The determination of the RMSD between structures used the MOE Superpose functionality which calculates an optimal global superposition by minimizing a weighted least squares error function.


In this section, we report the results of several computation experiments. In each of the experiments the following algorithm parameters were used:

  1. The failure limit in the Search Procedure, k, was set to 20; that is, the search was terminated after 20 successive attempts to generate a new alignment failed.

  2. The cartesian perturbation was conducted by adding a uniform random number in the range [-1,1] to each atom after first randomly rotating all non-ring bonds.
  3. Two configurations were considered the same if the root mean square optimal superposition distance (RMSD) was under 0.1 Angstroms.

Many of the alignment experiments are the same as those presented in Jones et al.7 for purposes of comparison.

DHF and MTX. Kearsley et al. use the alignment of DHF and an MTX analog as an example of the SEAL8 method. These structures are shown in Figure 1. These structures are very similar; however, the obvious alignment is not the correct alignment from the point of view of pharmacophore overlap. Furthermore, the tautomers of DHF place a certain burden on the pharmacophore typing system. We applied the present method to these structures with an iteration limit of 30 and 30 distinct configurations were output. The 6 top scoring configurations were very similar both in terms of relative geometry, potential energy and similarity scores (less than 1% relative difference). These minor differences in scoring were well within the differences expected due to the potential energy model. The obvious (pure steric) alignment was the seventh top scoring alignment. The top scoring alignment, presented in Figure 1, also had the smallest potential energy and each structure was within 1 kcal/mol of the nearest local minimum. The lower fused ring system is correctly aligned with respect to pharmacophore functionality. The alignment also suggests a common nitrogen acceptor in the atoms between the rings as well as a common NHCO group.


Figure 1. Top: Structures for DHF and MTX Kearsley et al. (1990). Bottom: Highest scoring alignment of structures DHF (blue) and MTX (yellow). The lower fused ring system is correctly aligned with respect to pharmacophore functionality. The alignment also suggests a common Nitrogen acceptor in the atoms between the rings as well as a common NHCO group.

Three benzodiazepine receptor binders. Codding and Muir studied10 three compounds that bind to the benzodiazepine receptor. These compounds were also analyzed with the Jones et al. method and are thus included in the present. The structures are shown in Figure 2. Two of the structures promote convulsions on binding while Ro15-1788 is an antagonist that does not promote convulsions. Codding and Muir's work predicted a binding site that recognized an aromatic ring, a donor oxygen, a hydrophobic side chain and a nitrogen donor group. It was hypothesized that the absence of the donor nitrogen is required for antagonism (e.g., in Ro15-1788). This case is interesting in that one of the structures is significantly different and doesn't seem to admit a clear-cut superposition of features. We applied our method to these structures with an iteration limit of 100 and produced 95 distinct alignments. The top scoring configurations differed in average potential energy by about 1.3% and in similarity score by 0.5% while the worst alignment had an average 5 kcal/mol more strain energy and a 30% worse similarity score.

Figure 2. Three benzodiazepine receptor binders (Ro15-1788 does not promote convulsions while the others do).

Figure 3 shows two representatives of the top scoring alignments; each has several common features as well as some unique features.

Figure 3. The top scoring alignment (top) and the next least strained alignment (bottom). Ro15-1788 is colored blue while the other molecules are colored yellow.

Referring to specific points of interest (indicated by letters in the figure), we note the following:

  • (A) Both alignments bring the donor nitrogens together in contrast with the results of Jones et al. but agreeing with Codding and Muir.
  • (B) Both alignments suggest a common aromatic ring; however, the first alignment (top) suggests the 6-ring near position E agreeing with Jones et al. whereas the second alignment (bottom) suggests the 5-ring.
  • (C) Both alignments suggest an acceptor - an sp2 oxygen for Ro15-1788 and CGS-8216 but an aromatic nitrogen for the other.
  • (D) The lower alignment suggests a possible hydrophobic right side while the upper alignment only fails to suggest it and also has a problematic protrusion at point C.

On the whole the lower alignment seems best in spite of the fact that it scored the third best in similarity score (but second best in potential energy). The chosen two alignments were within 0.06 kcal/mol of each other and were within 0.1% of each other in similarity score. These differences are well within what is expected from conformational potential differences.

Leu-enkephalin and a hybrid morphine. The second alignment problem presented in Jones et al. concerned the hybrid morphine molecule EH-NAL and Leu-enkephalin described by Kolb11 and shown in Figure 4. The bound conformations of these molecules are not known. The two structures are very different with the added problem of the flexibility of Leu-enkephalin. We applied our method with an iteration limit of 200 and calculated 189 distinct configurations. The five best scoring configurations exhibit features similar to those described by Kolb in that the rings marked (a) and (b) are brought together and the protonated nitrogens are superposed in all but one of the alignments. The five best scoring alignments differed by less than 0.6% in similarity score; however, the third best is, perhaps, the most interesting. It was the most strained of the configurations (~7 kcal/mol average) and were it not for this strain it would have scored the best, by far, in terms of similarity score alone.

Figure 4 shows this third best configuration with points of interest marked with the letters A through F:

  • (A) The alignment strongly suggests an aromatic ring feature with a phenol group (as do all of the top scoring alignments); moreover, aside from the immediate fused ring in EH-NAL the rough shape of the two molecules are in rough agreement.
  • (B) The alignment suggests an sp2 acceptor by overlapping a carbonyl oxygen and one of the =N-N= nitrogens. The rotation about this latter bond explains some of the strain exhibited by this alignment.
  • (C) An oxygen acceptor is suggested by the alignment which overlapped a carbonyl oxygen with the ring oxygen of EH-NAL.
  • (D) The alignment exhibits two aromatic rings in close proximity but not superimposed as with feature A suggesting either an aromatic or hydrophobic feature.
  • (E) The alignment superimposed the protonated nitrogens from both molecules indicating basic functionality.
  • (F) Although not superimposed, a hydrophobic side chain is suggested. The abundance of common features in this alignment help overcome the fact that the strain energy was the highest among the top five alignments. The ~7 kcal/mol average strain energy is possibly due to the lack of a solvation model or, perhaps, part of this strain may be an artifact of the forcefield or intra-molecular electrostatics.

Figure 4. Top: Hybrid morphine EH-NAL and Leu-enkephalin described by Kolb (1991). Rings marked (a) and (b) form the basis of a good alignment. Bottom: Superpositions of EH-NAL (yellow) and Leu-enkephalin (blue).

Four dopamine receptor binders. Hoberg and Norinder12 described four molecules shown in Figure 5: dopamine, the dopamine receptor agonist, and three anti-psychotic dopamine D2 receptor antagonists. Jones et al. report a pharmacophore of an aromatic ring, a protonated nitrogen and an oxygen donor. We applied our method to the four molecules using an iteration limit of 200. Chirality was preserved around carbon centers but not the nitrogen centers on the grounds that such centers are easily inverted due to proton exchange with the solvent. A total of 161 distinct alignments were produced.

Figure 5. Dopamine and three dopamine receptor antagonists.

The top scoring alignment is shown in Figure 6. Points of interest are marked with the letters A through D:

  • (A) Were it not for the protruding chlorine in raclopride the alignment suggests an aromatic feature; however, with the chlorine the suggestion is more like a hydrophobic/aromatic region.
  • (B) All molecules but amisulpride exhibit a polar feature (both donor and acceptor) but with the carbonyl oxygen of amisulpride the alignment suggests an acceptor.
  • (C) All four molecules exhibit a donor nitrogen; however, the direction of the hydrogen bond is not consistent. Inspection of the region reveals that it is sterically possible to unify the direction of the donor. The current method does not take direction into account and relies on local geometry of each feature for alignment. In the case of polar features which are both donors and acceptors (e.g., in a tautomer) there is no explicit hydrogen atom; perhaps the introduction of dummy atoms would help in such situations.
  • (D) A hydrophobic region is suggested and, although not fully aligned, it appears that a closer packing is energetically accessible.

Figure 6. Top scoring alignment of four dopamine receptor binders: haloperidol (red), amisulpride (yellow), raclopride (green) and dopamine (purple).

This best scoring configuration had 1.2 kcal/mol of average strain per molecule. Jones et al. reported that their best scoring alignment occurs when the protonated nitrogens overlap. We did not find a configuration in which all four protonated nitrogens overlapped; however, there were a few in which three of the four overlapped. Jones et al. reported difficulty in locating this alignment and perhaps our limit of 200 attempts was too strict.

Two FKB12 ligands. The Jones et al. paper describes a particularly interesting alignment example involving two very different structures with known experimental binding conformations. The two structures, shown in Figure 7 are FK506, a microbial product that blocks T-cell activation and proliferation, and "Compound 9" from a study by Holt et al.13

Figure 7. Structures of two binders with experimentally determined bound conformations to the human immunophilin FKBP12.

Both of the structures bind well to the human immunophilin FKBP12. The coordinates of the bound complex of FKBP12 and FK506 have been deposited in the Brookhaven Protein Databank14 (PDB) with the code 2FKE. The ligand of the complex was used to set chiral constraints on all carbon chiral centers. Our algorithm was applied to the structures with an iteration limit of 200 and 200 distinct alignments were produced as output. Inspection of the configurations revealed that conformations of the large ring in FK506 were being well sampled (one of the more interesting aspects of this example since torsional procedures require special logic to sample the conformational space of rings).

Figure 8. Best scoring alignment of FK506 (blue) and Compound 9 (yellow).

The top scoring alignment had both the least strain energy and the best alignment score and is shown in Figure 8 with points of interest marked with the letters A through F:

  • (A) The alignment suggests a hydrophobic group involving the carbon six ring of Compound 9.
  • (B) Another hydrophobic feature is suggested involving the second six ring of Compound 9 (with the nitrogen).
  • (C) and (D) Two carbonyl oxygens have been overlayed suggesting a hydrogen bond acceptor feature.
  • (E) An OH group and an sp2 oxygen have been overlayed suggesting an acceptor feature.
  • (F) A possible hydrophobic group is suggested, although the rings are not overlayed.

These results are in substantial agreement with the results of Jones et al. A comparison of the conformation of FK506 in the best scoring alignment and the conformation of the bound complex in the PDB is shown in Figure 9. Aside from the carbon six ring (not directly involved in binding) there is excellent agreement between the conformation calculated with our method and the experimentally determined coordinates. This result is particularly encouraging since it appears that the alignment of two binding ligands, even if very different, can be enough to direct a conformational search towards the bound conformation without explicit receptor information.

Figure 9. Comparison of the calculated bound conformation of FK506 (blue) with the experimentally determined bound complex coordinates (yellow) in the PDB (code 2FKE). The carbon six rings at position A are outside the binding pocket.


A method for the flexible alignment of molecules was presented along with the results of several experiments which suggest the method's efficacy. The method has several important applications:

  1. Pharmacophore Elucidation. Several active compounds can be aligned in order to visualize or determine a pharmacophore in the absence of a receptor model.

  2. Comparative Field Analysis. Several active compounds can be aligned and used as input to steric and electrostatic field analysis methods.

  3. Template Forcing. When the coordinates of a complex are available, or the conformation of an active compound is known, the bound ligand can be used as a template. In such a case, the coordinates of the bound ligand are held fixed and candidate ligands can be flexibly aligned to the template.

The method has several advantages: a) there are no limits on the number of ligands that can be aligned; b) both conformational and alignment searches are conducted simultaneously; c) the competing goals of feature superposition and potential energy strain are balanced in a natural way; and d) a collection of alignments is output for comparison.

The alignment methodology may be licensed from Chemical Computing Group Inc. for use in the Molecular Operating Environment software.


1 Schaefer,M., Karplus,M., A Comprehensive Analytical Treatment of Continuum Electrostatics, J. Phys. Chem., 100, p1578-1599 (1996).

2 Bush, B.L., Sheridan, R.P., PATTY: A Programmable Atom Typer and Language for Automatic Classification of Atoms in Molecular Databases, J. Chem. Info. Comp. Sci., 33, pp756-762 (1993).

3 Ferguson, D.M., Raber, D.J., A New Approach to Probing Conformational Space with Molecular Mechanics: Random Incremental Pulse Search, J. Am. Chem. Soc., 111, p4371-4378 (1989).

4 MOE software available from Chemical Computing Group Inc., Montreal, Canada. Consult for more information on MOE (1997, 1998).

5 Gill, P., Murray, W., Wright, M.H., Practical Optimization, Academic Press, New York, (1981).

6 Halgren, T.A., The Merck Molecular Force Field, J. Comp. Chem., Vol. 17, Nos. 5&6, p490 (1996).

7 Jones, G., Willet, P., Glen, R.C., A Genetic Algorithm for Flexible Molecular Overlay and Pharmacophore Elucidation, J. Computer Aided Molecular Design, 9, Vol 32 (1995).

8 Kearsley, S., An Alternative Method for the Alignment of Molecular Structures: Maximizing Electrostatic and Steric Overlap, Tetrahedron Computer Methodology, Vol. 3, No. 6C, pp615-633 (1990).

9 Bradbury, R.H., Allott, C.P., Dennis, M., Fisher, F., Major, J.S., Masek, B.B., Oldham, A.A., Pearce, R.J., Rankine, N., Revill, J.M., Roberse, D.A., Russell, S.T., J. Med. Chem., 35, p4227 (1992).

10 Codding, P.W., Muir, A.K.S., Mol. Pharmacol., 28, 178 (1985).

11 Kolb, V.M., Prog. Drug Res., 36, p49 (1991).

12 Hoberg, F., Norinder, U. in Krogsgaard-Larsen, P., Bundgaard, H. (Eds.), A Textbook of Drug Design and Development, Harwood Academic Publishers, Reading, p55-91 (1993).

13 Holt, D.A., Luengo, J.L., Yamashita, D.S., Oh, H., Konialian, A.L., Yen, H., Rozamus, L.W., Brandt, M., Bossard, M.J., Levy, M.A., Eggleston, D.S., Liang, J., Schultz, L.W., Stout, T.J. Clardy, J., J. Am. Chem. Soc., 115, p9925 (1993).

14 Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, F., Bryce, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., Tasumi, M., J. Mol. Biol., 112, p535 (1977).