Journal Articles



MOE Deployment Strategies


P. Labute
Chemical Computing Group Inc.


Introduction

In this article, we describe deployment strategies for the computer software package for computational chemistry, cheminformatics and bioinformatics called MOE, which stands for Molecular Operating Environment. The phrase "Operating Environment" (an amalgam of "Operating System" and "Application Environment") was chosen deliberately to re-enforce the fundamental design objective:

MOE shall be a computerized environment dedicated to computational chemistry and bioinformatics in which scientific methodology can be rapidly prototyped, experimented with, verified, deployed and effectively used.

This sentence warrants some elaboration as it touches upon a number of issues relevant to the practice of computational chemistry in both industrial and academic settings.

Firstly, the phrase "a computerized environment dedicated to chemistry" is used so as not to restrict MOE to any particular facet of computational chemistry, cheminformatics, or bioinformatics. In many cases, computer programs for chemistry are specialized and integrate poorly with other programs. MOE is intended to be an environment in which scientists can exploit the many techniques and methodologies they require in order to be effective. For example, Molecular Mechanics programs have traditionally been separate from Chemical Information systems. The artificial line between them has caused the unfortunate identities:

Molecular Modeling = Dynamics Simulations
Chemical Information = Database Management

Both activities are more properly contained in the general notion of Chemical Computing and there is no real reason to make such an artificial delineation. MOE is intended to be a general platform for all activities pertaining to Chemical Computing.

Secondly, the phrase "in which new methodology can be rapidly prototyped, experimented with ..." means an environment in which computer programs can be written quickly and tested. Here, two notions are important:

  • That MOE be programmable. We formally recognize that "as-is" computer programs are not sufficiently flexible to meet the needs of practicing computational chemists: it is of fundamental importance to have the ability to incorporate the latest research ideas. This necessarily means that MOE must have some sort of programming capability.

  • That MOE programs can be written rapidly. Anyone who has written a C or Fortran program knows that significant time and resources must be committed to completing the task. Often, the prospect of spending a lot of time writing a C or Fortran program to test out an idea has a deterrent effect and the testing of the idea is not attempted. MOE was designed so that programming time is kept to a minimum.

Thirdly, the trailing conjunction "... deployed and effectively used" means that MOE must be user-friendly, available on many computer platforms, efficient and exhibit reasonable performance. It was our intention that the vast majority of programs written with MOE be efficient enough to be deployed as end-applications. To this end, the environment itself had to contain the necessary components (e.g., graphics, molecule file I/O, force fields, window interface, etc.) and applications (e.g., Homology Modeling, Molecular Similarity, etc.) to be used as a complete computational chemistry applications package.

The Molecular Operating Environment (MOE) is the next generation of molecular computing software tool. MOE is not a software package in the usual sense, but an integrated Applications and Methodology Development Platform; that is, a tool for chemical computing software development and deployment. MOE integrates visualization, simulation and application development in one package. Custom methodology modules can be developed with the built-in high-performance data-parallel programming language SVL, the Scientific Vector Language.

At the time of this writing, MOE contains ready-to-use and self-contained applications for:

  • Bioinformatics and Protein Modeling. Sequence Searching -- Threading and Inverse Folding -- Secondary Structure Prediction -- Multiple Sequence and Structural Alignment -- Protein Stabilizing Contact Analysis -- Homology Consensus Model Generation -- 3D Structure Homology Modeling -- Stereochemical Quality Evaluation -- Energy Minimization -- Molecular Dynamics.

  • Lead Generation and Optimization. Combinatorial Compound Generation -- Large-Scale Diverse Subset Selection -- 3D Structure Similarity Searching -- Quantitative Structure Activity Relationships -- Conformational Search -- Energy Minimization -- Molecular Dynamics -- Molecular Alignment and Superposition -- Large Scale Molecular Database and Graphical Viewer -- Hybrid Monte Carlo and Molecular Dynamics Trajectory Generator.

  • Polymer and Material Science. Polymer Property Predictions -- Quantitative Structure Property Relationships -- Large-Scale Molecular Spreadsheet -- Energy Minimization -- Molecular Dynamics of Periodic Systems.

Methodology development in MOE is based upon SVL, the Scientific Vector Language. SVL is a new high-performance data-parallel programming language built into the MOE Molecular Operating Environment. SVL is an embedded language; that is, its compiler and run-time environment are an integral part of MOE. SVL serves as the command, macro, scripting, and high-performance computing language of MOE.

MOE is not a C or Fortran subroutine library but a complete executable application. The run-time environment contains the basic core graphical user interface, molecular data structure services and the SVL development and run-time environment. SVL programs are run from within the MOE application.

Most methodology is developed in SVL; however, in-house codes can be linked into MOE with the SVL Application Program Interface (API). For example, toolkits such as those provided by Daylight can be linked into MOE through API extensions of the SVL language.

In addition to the internal molecular data structures, force fields and I/O facilities, MOE contains the following major modules for methodology development:

  • SVL™ - The Scientific Vector Language.
  • SVL-WT™ - Graphical User Interface toolkit.
  • SVL-API™ - C and Fortran extension toolkit.
  • MOE-ViewDB™ - Large-scale molecular spreadsheet.
  • MOE-Match™ - Molecular pattern matching library.
  • MOE-OptLib™ - Nonlinear optimization library.

A very important benefit results from the architecture of MOE. Any program written in SVL will run as-is on any platform that runs MOE; that is, every SVL program can leverage MOE's portability. Currently, MOE runs on a variety of computer platforms and comes in a variety of guises:

  • MOE™ - Full graphical environment.
  • MOE/batch™ - Batch/Server version.
  • MOE/XL™ - Microsoft Excel add-in version.
  • SGI - Silicon Graphics Workstations running IRIX.
  • Sun - Sun Microsystems Workstations running Solaris.
  • Windows/NT - Intel-based PCs.
  • Window95 - Intel-based PCs.

Chemical Computing Group's goal is to port the MOE environment to as many computer platforms as possible from desktop through super computers. In this way, every SVL program will be able to run, unchanged, on the full spectrum of computing equipment.

Because of the foregoing factors, MOE is an extremely flexible chemical computing software software software system that can be used effectively in a number of ways

  • As an "out-of-the-box" application environment.
  • As a platform for customized methodology.
  • As a platform for new methodology development.
  • As an embedded network and database technology.

In the remainder of this article, we will elaborate on a number of possible ways in which MOE and its applications can be deployed.

Deployment Strategies

As was mentioned in the preceding section, MOE runs in a number of forms:

  • MOE - the full Graphical Environment running on Unix workstations or Windows/NT and Windows95 platforms.

  • MOE/batch - the batch/server version of MOE; that is, MOE with all of the graphics removed.

  • MOE/XL - the add-in module to Microsoft Excel running on the Windows/NT and Windows95 platforms.

These versions of MOE will run the same SVL applications (with the exception that the batch versions will not run graphical applications). These variants of MOE are instrumental to the scalability of MOE: by making the appropriate selection these deployment vehicles MOE can serve both small and large organizations.

While it is impossible to enumerate all of the configurations that can be put together, we will, however, examine the building blocks and some example configurations. We start with two basic building blocks:

This symbol represents the workstation (either Unix or PC) running the full MOE graphical environment used by an application scientist. Here, the scientist will rely upon Chemical Computing Group, a methodology group, or a third party for application development. This configuration is appropriate for a lone computational chemist or academic setting.

This symbol represents the workstation (either Unix or PC) running the full MOE graphical environment used by a methodology developer. Here, the scientist will supplement Chemical Computing Group applications with methodology developed in-house. This configuration is appropriate for a "one-man-shop", academic, or consultant.

Naturally, either of these two building blocks can be replicated giving a pure applications group or a pure methodology group. By combining these building blocks we obtain

which consists of a group in which some scientists are primarily applications scientists and some are (possibly occasional) methodology developers that that support the group. In larger organizations, the two groups may be separated and specialized. In either case, the fact that the same software is used by both kinds of scientist streamlines the development and deployment process.

We are now in a position to introduce the batch/server building blocks. In general, MOE/batch will be used on compute servers while other applications are used as clients. These applications may be other instances of MOE, Network Browsers or Office applications.

This symbol represents a compute server, or departmental computer, running MOE/batch. Typically, compute-bound simulations and searches of large databases are performed in the background. Alternatively, this configuration can provide compute services for network browser clients.

This symbol represents a community of Network Browsers that will use a compute server running MOE/batch to effect calculations. This configuration is appropriate for a wider audience of occasional users of pre-packaged MOE methodology.

This symbol represents a group of users using the MOE/XL add-in to Microsoft Excel. Such users use pre-packaged MOE methodology in the Excel environment possibly in conjunction with other Excel applications for chemistry.

Putting all of the components together we see that the MOE family of products and applications can serve large organizations consisting of diverse and separate groups of users.

In this kind of configuration a large organization can exploit the full benefits of MOE. There are multiple delivery paths for scientific applications and different classes of users can choose their level of complexity. For example, Suppose that a new methodology is developed and written in SVL. Such an application may have been developed by the Methodology Development Group or perhaps by Chemical Computing Group. This application can now be deployed in a number of ways; for example,

  • The application is immediately ready for use by the Applications Group running the full MOE graphical environment. Indeed, any user running MOE, whether it be on Unix workstations or PCs can exploit the new methodology.

  • The application can be placed on the Compute Server for use in background by the Applications Group (or any other users requiring background compute services).

  • The Methodology Development Group, or Web Services Group can deploy the application on the Web Browser Server, write an appropriate script with corresponding Web Page and make a pre-packaged version of the application available to the IntraNet audience.

  • The application can be deployed on the desktop either through customized versions of the MOE graphical environment or throught the MOE/XL add-in to Microsoft Excel. The application can then be used in conjunction with other Office chemistry products (e.g., MDL's ISIS for Excel).

  • The application can be used as a data filter in relational database systems. By using MOE/batch as a data massaging application, the application (and indeed MOE itself) can be used as an embedded database technology.

Summary

MOE is not just another software package. Firstly, MOE was conceived as a general purpose Chemical Computing Environment; that is, a single tool that facilitates all forms of computational chemistry. Chemical Computing Group sees MOE as the vehicle for the integration of Quantum Chemistry, Molecular Modeling and Chemical Information. Secondly, MOE is a Methodology Development Platform, a package in which chemists can build and assemble their own tools quickly with SVL, the built-in high-performance programming language.

Although one is not compelled to develop new methodology with MOE, the concepts of an integrated Applications and Methodology Development Platform are the keys to MOE's flexibility. Once the decision is taken to use the methodology development capabilities of MOE, the full benefits of MOE can be reaped:

  • Time Savings. SVL programs are concise and easy to write. Reductions in code size of 10 to 1 are routinely realized. Because SVL programs are small and portable (running on any platform that MOE runs on), maintenance costs are minimized as well.

  • Creative Throughput. New methodology can be prototyped, experimented with, modified and verified quickly. In almost all cases, no recoding to C or Fortran is necessary since SVL is a high-performance programming language.

  • Research Competitiveness. With C or Fortran, the incorporation of the latest theoretical ideas requires significant resources. Often, the costs are prohibitive and scientists are forced to wait until commercial versions become available. With MOE, the latest published methodology can be implemented and validated with minimal expenditure.

  • Cost Savings. A significant amount of commercial computational chemistry software can be implemented with MOE with extremely few lines of code. The MOE system provides the critical mass of tools needed to reduce the dependence on software that integrates poorly with in-house methodology.

The high-level nature of SVL programming makes methodology development a viable alternative for those that do not want to allocate the resources to C and Fortran development. Even if the decision is taken to refrain from custom development, the fact that Chemical Computing Group uses this model to produce its own applications allows us to react rapidly to the needs of our customers who can then further benefit from MOE's deployment flexibility.