QuaSAR: The MOE System for QSAR
Chemical Computing Group Inc.
Structure-activity relationship (SAR), and, more generally, structure-property relationship (SPR) analysis are integral to the rational drug design cycle. Quantitative (QSAR, QSPR) methods assume that biological activity is correlated with chemical structure or properties and that as a consequence activity can be modeled as a function of calculable physiochemical attributes. Such a model for activity prediction could then be used, for instance, to screen candidate lead compounds or to suggest directions for new lead molecules.
The QSAR/QSPR functions in MOE are provided in a package entitled QuaSAR. The QuaSAR package is designed so that it delivers
The QuaSAR system is depicted in the flow diagram below.
The components of the QuaSAR package are a combination of SVL descriptor modules and SVL programs to operate the fundamental MOE molecular services:
The MOE Molecular Database and Graphical viewer components and the built-in SVL programming language have been introduced in previous JCCG articles. For more information on these components, please see
This article is primarily concerned with the calculation of molecular properties or "descriptors". A technical treatment of the model building services of QuaSAR will appear in a later issue of JCCG.
Calculation of Molecular Descriptors
Central to the QuaSAR system is the Descriptor Manager, with which the user communicates via the Descriptor Graphical User Interface (GUI). The Descriptor Manager searches the MOE system for loaded QuaSAR modules in a standard format, and automatically knows to interrogate them for descriptor information and to instruct them to perform parameter prompting and descriptor calculations. The calculated values are accumulated in a MOE Molecular Database file which can be viewed using the MOE Database Viewer.
The MOE QuaSAR descriptor calculation service comprises the top elements of the flow chart shown above:
These components are written in the Scientific Vector Language (SVL). The source code for all built-in descriptor calculators is part of the MOE distribution, and can be used as a template for generating custom descriptors. A standard configuration file format is supported for writing and loading collections of descriptors. QuaSAR can also be run in batch mode.
The QuaSAR user interface provides graphical tools to inquire which descriptors are currently loaded in MOE, and to choose and to edit the parameters of the ones of interest.
The QuaSAR Calculate window (see figure below) can be launched from the MOE Database Viewer File menu. This window contains a descriptor list summarizing which molecular properties are to be computed. Each descriptor is identified by a unique descriptor code, and labeled according to its class. The descriptors in the list are computed for all molecules in the database.
Descriptors are categorized by class:
The descriptor list is grown using the Add QuaSAR Descriptors panel. This panel lists all the currently available descriptors in the system. "Currently available" refers to those descriptor calculator functions that are recognized by the QuaSAR Descriptor Manager.
If descriptors take parameters, their parameter values can be accessed from a Parameters panel launched from the main QuaSAR window.
QuaSAR descriptor output data are written to a molecular database file. The MOE Database Viewer can be used to monitor QuaSAR's progress while it is running: it provides a view onto the output disk file even as it is being written. The Database Viewer's plotting facility allows the user to examine the generated data and to select database entries whose values fall within a particular range of interest.
The QuaSAR descriptor data are normally examined and filtered by the user before being directed to the Model Builder. The basic Model Fit module performs multiple linear regression in the partial least squares (PLS) sense on all the descriptors passed to it. Once the model has been generated, it is evaluated for its significance and validity. If a model evaluator module indicates that the model should be rejected, there are two possible avenues of recourse. The first is to select a new group of descriptors to pass to the model fitter. This may involve calculating new descriptors, or re-calculating descriptors with new parameters. The second is to construct a completely new (e.g., non-linear) model fitting module; model generation with this new fitter may also require further descriptor calculations. The cycle of descriptor calculation, model fitting, and model evaluation terminates when the user judges that the resulting model is satisfactory.
Adding New Descriptors
MOE QuaSAR currently offers a collection of built-in descriptors which includes molecular volume and attendant properties, polarizability, globularity, moments, potential energy-related properties, shape similarity, radius of gyration, accessible surfaces, and connectivity indices. This list is continually growing.
MOE users can easily introduce their own new descriptors into MOE by writing and loading an SVL program module in accordance with QuaSAR conventions. Each descriptor requires a minimum of two functions that interface with the QuaSAR Descriptor Manager:
Other module functions are optional. For example, if a descriptor is parameterized (e.g., accessible surface area probe radius) an additional function can be supplied to handle the graphical user interface for the descriptor.
Here we use the Zagreb index as an illustrative example. This index is a measure of branching, and is calculated as the sum of the squared valencies (number of bonds) of each atom in a hydrogen-suppressed molecule.
The first required function is the list function which returns a catalog of descriptors, unique codes, descriptions and classes of descriptors supplied by the module. In this case, there is only one descriptor and so the SVL function would resemble:
The SVL transpose operator, tr is used to collect all codes into one list, all descriptions into another, and so forth. This is more intuitive for larger tables, but still needed in a table with only one descriptor entry, such as the one here.
The other requisite function is the calculator function that does the actual calculations. The function accepts a Molecular Database molecule, a set of codes to calculate (in the case of multiple descriptors per module) and parameters, if any, to the particular descriptors. In this case, there is only one descriptor to calculate and so the function would look like:
local bonds = db_mol(4)(MOL_ATOM_BONDS);
return add sqr app length bonds;
Heavy atoms are extracted from the molecular data by the function db_MolHeavyAtoms. The bonds of each atom are retrieved, and the valencecomputed per atom is calculated using the length function. app is used to apply the length calculation over the neighbor list of each atom. Squaring and summing are accomplished with sqr and add.
If a descriptor accepts parameters, one more function is needed - to put up an edit parameters panel. An example of such a panel is that used for entering parameters to an accessible surface area descriptor:
This panel can be made to be as complex as needed, and the full capabilities of the SVL GUI utilities can be brought to bear on its custom design. Please refer to Graphical User Interface Programming in MOE for more information. Once a descriptor module is complete, it is entered into the system simply by loading it into MOE; e.g., at the command line
If the new descriptor follows the standard QuaSAR descriptor conventions, then when QuaSAR is run, the descriptor will appear in the Add Descriptors panel, and will treated in the same manner as any of the built-in descriptors.
MOE QuaSAR is a complete QSAR/QSPR analysis toolset. The first step in such analysis is to compute molecule properties. QuaSAR's graphical interface is a gateway into the QuaSAR system, and allows the user to select which descriptors to calculate. The interface scans MOE and automatically detects QuaSAR modules, including those introduced by the user. Users can quickly and easily build descriptors of their own custom design and add them into the system; the source code of the built-in descriptors is distributed with QuaSAR can be used as templates.
After descriptors are calculated, they are sent to the suite of MOE QuaSAR model-building functions for model fitting and evaluation. In the model refinement stage, further descriptor calculations may be required. This cycle continues until a satisfactory model is found, or until the current model is rejected and the process begun anew.
Look for an article on QuaSAR's modeling services in an upcoming issue of JCCG.