Journal Articles



QuaSAR: The MOE System for QSAR


A. Lin
Chemical Computing Group Inc.


Introduction

Structure-activity relationship (SAR), and, more generally, structure-property relationship (SPR) analysis are integral to the rational drug design cycle. Quantitative (QSAR, QSPR) methods assume that biological activity is correlated with chemical structure or properties and that as a consequence activity can be modeled as a function of calculable physiochemical attributes. Such a model for activity prediction could then be used, for instance, to screen candidate lead compounds or to suggest directions for new lead molecules.

The QSAR/QSPR functions in MOE are provided in a package entitled QuaSAR. The QuaSAR package is designed so that it delivers

  • A rich set of molecular descriptors.
  • A flexible and extensible environment.
  • A modular protocol for model fitting methodology.
  • Integrated access to MOE Molecular Database facilities.
  • Integration with MOE's base computational chemistry functionality.

The QuaSAR system is depicted in the flow diagram below.

The components of the QuaSAR package are a combination of SVL descriptor modules and SVL programs to operate the fundamental MOE molecular services:

  • SVL Descriptor Modules. The QuaSAR system uses SVL program modules as the source for all of its molecular descriptors. In this way, descriptors can be modified and new descriptors are easily added to the system.

  • Descriptor Configuration Files. These files record and specify collections of descriptors, their parameters and any output models determined by the model builders. The Configuration Files are self-contained and can be used to evaluate the models on new molecules (not in the fitted set).

  • Descriptor Manager. The Descriptor Manager scans the SVL run-time environment for all loaded QuaSAR descriptors. This dynamic search for descriptors is one of the pillars of the flexibility of QuaSAR.

  • Descriptor Calculator. The QuaSAR Descriptor Calculator coordinates the calculation of descriptors of collections of molecules specified with the QuaSAR Graphical Interface or a Configuration File.

  • Model Builders. These modules perform the calculations required to produce QSAR models; for example, the Partial Least Squares module is capable of producing the best linear model in a least squares sense as well as a statistical analysis and report.

The MOE Molecular Database and Graphical viewer components and the built-in SVL programming language have been introduced in previous JCCG articles. For more information on these components, please see

This article is primarily concerned with the calculation of molecular properties or "descriptors". A technical treatment of the model building services of QuaSAR will appear in a later issue of JCCG.

Calculation of Molecular Descriptors

Central to the QuaSAR system is the Descriptor Manager, with which the user communicates via the Descriptor Graphical User Interface (GUI). The Descriptor Manager searches the MOE system for loaded QuaSAR modules in a standard format, and automatically knows to interrogate them for descriptor information and to instruct them to perform parameter prompting and descriptor calculations. The calculated values are accumulated in a MOE Molecular Database file which can be viewed using the MOE Database Viewer.

The MOE QuaSAR descriptor calculation service comprises the top elements of the flow chart shown above:

  1. a graphical user interface,
  2. a set of descriptor calculator functions, and
  3. a coordinator of these functions.

These components are written in the Scientific Vector Language (SVL). The source code for all built-in descriptor calculators is part of the MOE distribution, and can be used as a template for generating custom descriptors. A standard configuration file format is supported for writing and loading collections of descriptors. QuaSAR can also be run in batch mode.

The QuaSAR user interface provides graphical tools to inquire which descriptors are currently loaded in MOE, and to choose and to edit the parameters of the ones of interest.

The QuaSAR Calculate window (see figure below) can be launched from the MOE Database Viewer File menu. This window contains a descriptor list summarizing which molecular properties are to be computed. Each descriptor is identified by a unique descriptor code, and labeled according to its class. The descriptors in the list are computed for all molecules in the database.

Descriptors are categorized by class:

  • 2D descriptors are calculated from purely atomic and connectivity properties, for instance, molecular weight, molecular branching, and sum of atomic polarizabilities.

  • i3D descriptors. The "i" in i3D is for "internal"; i3D descriptors use relative atomic 3D coordinates. Some examples are potential energy, volume, and water-accessible surface area.

  • x3D descriptors- The "x" is x3D is for "external"; x3D use absolute atomic coordinates (i.e., aligned molecules are required); for example, the X coordinate of the dipole moment, common overlap volume, and receptor interaction energy.

The descriptor list is grown using the Add QuaSAR Descriptors panel. This panel lists all the currently available descriptors in the system. "Currently available" refers to those descriptor calculator functions that are recognized by the QuaSAR Descriptor Manager.

If descriptors take parameters, their parameter values can be accessed from a Parameters panel launched from the main QuaSAR window.

QuaSAR descriptor output data are written to a molecular database file. The MOE Database Viewer can be used to monitor QuaSAR's progress while it is running: it provides a view onto the output disk file even as it is being written. The Database Viewer's plotting facility allows the user to examine the generated data and to select database entries whose values fall within a particular range of interest.

The QuaSAR descriptor data are normally examined and filtered by the user before being directed to the Model Builder. The basic Model Fit module performs multiple linear regression in the partial least squares (PLS) sense on all the descriptors passed to it. Once the model has been generated, it is evaluated for its significance and validity. If a model evaluator module indicates that the model should be rejected, there are two possible avenues of recourse. The first is to select a new group of descriptors to pass to the model fitter. This may involve calculating new descriptors, or re-calculating descriptors with new parameters. The second is to construct a completely new (e.g., non-linear) model fitting module; model generation with this new fitter may also require further descriptor calculations. The cycle of descriptor calculation, model fitting, and model evaluation terminates when the user judges that the resulting model is satisfactory.

Adding New Descriptors

MOE QuaSAR currently offers a collection of built-in descriptors which includes molecular volume and attendant properties, polarizability, globularity, moments, potential energy-related properties, shape similarity, radius of gyration, accessible surfaces, and connectivity indices. This list is continually growing.

MOE users can easily introduce their own new descriptors into MOE by writing and loading an SVL program module in accordance with QuaSAR conventions. Each descriptor requires a minimum of two functions that interface with the QuaSAR Descriptor Manager:

  1. descriptor list - is a function which returns a catalog of descriptors that can be calculated with a given module. For example, the surface area descriptor module can calculate over twenty individual descriptors. For efficiency, these descriptors are collected in a single SVL module.

  2. descriptor calculator - is a function which actually calculates the values of descriptors. The QuaSAR Descriptor Manager will traverse a collection of molecules and invoke the calculator functions for all required descriptors.

Other module functions are optional. For example, if a descriptor is parameterized (e.g., accessible surface area probe radius) an additional function can be supplied to handle the graphical user interface for the descriptor.

Here we use the Zagreb index as an illustrative example. This index is a measure of branching, and is calculated as the sum of the squared valencies (number of bonds) of each atom in a hydrogen-suppressed molecule.

The first required function is the list function which returns a catalog of descriptors, unique codes, descriptions and classes of descriptors supplied by the module. In this case, there is only one descriptor and so the SVL function would resemble:

    function QuaSAR_list_Zagreb [] = tr [
      [ 'zagreb', 'Zagreb Index', '2D', [] ]
    ];

The SVL transpose operator, tr is used to collect all codes into one list, all descriptions into another, and so forth. This is more intuitive for larger tables, but still needed in a table with only one descriptor entry, such as the one here.

The other requisite function is the calculator function that does the actual calculations. The function accepts a Molecular Database molecule, a set of codes to calculate (in the case of multiple descriptors per module) and parameters, if any, to the particular descriptors. In this case, there is only one descriptor to calculate and so the function would look like:

    function QuaSAR_calc_Zagreb [db_mol, codes, parm]
      db_mol = db_MolHeavyAtoms db_mol;
      local bonds = db_mol(4)(MOL_ATOM_BONDS);
      return add sqr app length bonds;
    endfunction

Heavy atoms are extracted from the molecular data by the function db_MolHeavyAtoms. The bonds of each atom are retrieved, and the valencecomputed per atom is calculated using the length function. app is used to apply the length calculation over the neighbor list of each atom. Squaring and summing are accomplished with sqr and add.

If a descriptor accepts parameters, one more function is needed - to put up an edit parameters panel. An example of such a panel is that used for entering parameters to an accessible surface area descriptor:

This panel can be made to be as complex as needed, and the full capabilities of the SVL GUI utilities can be brought to bear on its custom design. Please refer to Graphical User Interface Programming in MOE for more information. Once a descriptor module is complete, it is entered into the system simply by loading it into MOE; e.g., at the command line

    svl> load 'my_new_descriptor.svl'

If the new descriptor follows the standard QuaSAR descriptor conventions, then when QuaSAR is run, the descriptor will appear in the Add Descriptors panel, and will treated in the same manner as any of the built-in descriptors.

Summary

MOE QuaSAR is a complete QSAR/QSPR analysis toolset. The first step in such analysis is to compute molecule properties. QuaSAR's graphical interface is a gateway into the QuaSAR system, and allows the user to select which descriptors to calculate. The interface scans MOE and automatically detects QuaSAR modules, including those introduced by the user. Users can quickly and easily build descriptors of their own custom design and add them into the system; the source code of the built-in descriptors is distributed with QuaSAR can be used as templates.

After descriptors are calculated, they are sent to the suite of MOE QuaSAR model-building functions for model fitting and evaluation. In the model refinement stage, further descriptor calculations may be required. This cycle continues until a satisfactory model is found, or until the current model is rejected and the process begun anew.

Look for an article on QuaSAR's modeling services in an upcoming issue of JCCG.