Journal Articles

Automatic Assignment Of Bond Order



P. Labute
Chemical Computing Group Inc.
1010 Sherbrooke Street W, Suite 910; Montreal, Quebec; Canada H3A 2R7.


May 1, 1996


Abstract. A method is presented for the automatic assignment of higher order bonds to a molecular graph.

INTRODUCTION



The automatic assignment of bond orders to molecular graphs is of fundamental importance when "cleaning up" large databases of molecules. In this article, we present an algorithm for the assignment of bond orders that uses element, ionization and hybridization information as input. The rationale for this choice of input is that it can be inferred easily from many file formats even in the presence of input errors.

ASSIGNMENT OF KEKULE STRUCTURE



Let G=(V,E) be a graph. As a shorthand, we will use kj to denote the edge {k,j}. Also, if A and B are two sets, then we use A\B to denote the symmetric difference (A-B) + (A-B).

A matching is a subset, M, of edges in E such that if e, f are in M implies that e and f are disjoint (i.e., do not share a vertex). A node x that does not belong to an edge in M is exposed with respect to M. A matching M of largest cardinality is a maximum matching. A perfect matching is a matching in which every node of V is not exposed in M. (A graph has a valid Kekule structure if and only if it has a perfect matching.)

An augmenting path with respect to a matching M is a simple path abc...yz of nodes connected by edges in E such that nodes a and z are exposed, and edges bc, de, ... xy are in M and the other edges of the path are not in M.

The significance of alternating edges to maximum matching is due to the facts (C.H.Papadimitriou, K.Steiglitz, Combinatorial Optimization: Algorithms and Complexity, Prentice-Hall, New Jersey, 1982):

  • Let P be the set of edges on an augmenting path p=[abc...yz] in a graph G with respect to the matching M. Then M\P is a matching of cardinality |M|+1.
  • A matching M in a graph G is maximum if and only if there is no augmenting path in G with respect to M.
  • Suppose that in a graph G there is no augmenting path starting from an exposed node u with respect to a matching M. Let P be an augmenting path with endpoints two other exposed nodes v and w. Then there is no augmenting path from u with respect to M\P either.

These facts lead to the following greedy algorithm for finding a maximum matching:

  1. Set M = 0.
  2. If there are less than 2 exposed nodes in G with respect to M then terminate the algorithm with M as the maximum matching.
  3. Let a be any exposed node. Let p=[abc...z] be an augmenting path with respect to M. If no such path can be found, remove from the graph all edges involving node a. Go to step 2.
  4. Otherwise, let P be the set of edges in p and set M = M\P. Go to step 2.

AUTOMATIC ASSIGNMENT OF BOND ORDER



Let G=(V,E) be a molecular graph. In G, the nodes represent atoms and the edges represent bonds; that is, if k and j are in V then atoms k and j are bonded iff kj is in E.

The objective is to assign, to each edge, a bond order in the set {1,2,3} under the assumption of full valence for each atom. Initially, each bond is given an order of 0 to indicate that no assignment has been made.

Step 1. This phase of the algorithm assigns bond orders to the obvious cases or the bond orders that should be assigned even for an illegal structure (to localize errors). This is the only phase that emphasizes chemical feasibility; that is, the remaining methods are mathematical and chemically unaware.

  1. Assign a bond order of 1 to all edges involving a Hydrogen, a metal, a halogen, a noble gas element, or an atom with a coordination number of 4 or more.
  2. Assign a bond order of 1 to all edges involving an atom in an sp3 or d-hybridized state.
  3. With the ionization states of each atom (or better still, the formal charges) replace each ionized element with its neutral first valence equivalent, that is, N+ becomes C, S- becomes N.
  4. Assign a bond order of 1 to all edges involving Na, Li, F or Ne.
  5. Assign a bond order of 1 to all sp2 Borons and change all sp Borons to sp2.
  6. Assign a bond order of 1 to all edges emanating from Oxygens with a heavy atom coordination number of 2 or more.
  7. Assign a bond order of 1 to all edges emanating from known conjugated atoms (in particular Nitrogens).
  8. Assign alternating bond orders of 3 and 1 to all sp chains terminating with an sp with heavy coordination number 1.
  9. For each remaining odd-length chain or ring of sp atoms, assign a bond order of 2 to all edges in the chain.

Assign a bond order of 1 to all the remaining unassigned bonds of sp2 atoms that have been assigned a bond order of 2 in this step. Assign a bond order of 2 to all the remaining unassigned bonds of sp atoms that have been assigned a bond order of 1 in this step.

Remove from the graph all edges that have been assigned bond orders in this step. What remains in the graph are:

  • even length chains of sp atoms with heavy coordination numbers of 2;
  • sp2 atoms with coordination 3 or less;
  • Oxygens with heavy atom coordination of 1

Step 2. At this point in the algorithm, what remains are atoms for which no immediate or forced bond assignments can be made. Since we have no further forced information, we use the maximum matching algorithm described above to determine a maximum matching M in G.

Step 3. Since there are no odd-length sp chains, assign any sp-sp matched bonds a bond order of 3 and other matched bonds a bond order of 2. All remaining bonds are assigned a bond order of 1.