9:55 PM - DS01.15.07
Web Platform of a Molecular Generative Model for Experimental Chemists
Seiji Takeda1,Toshiyuki Hama1,Hsianghan Hsu1,Akihiro Kishimoto1,Lisa Hamada1,Daiju Nakano1
IBM Research Tokyo1
Show Abstract
1. Introduction
Meticulous design of molecules is a key of material development. However, molecular design processes heretofore has been driven by human expert’s trial-end-error cycles, therefore, a typical lead time to design new material takes more than 10 years. Over the past half a decade, data-driven approaches leveraging artificial intelligence (AI) have been introduced to this field, especially on the context of generative model. The primary requirement for molecular generative models is industrial practicality, including high speed design with large structural variety. Another important requirement is that a model should be accessible for researchers and engineers of experimental chemistry, who are not necessarily familiar with software-related works; writing Python scripts, using a command line interface, etc. Releasing a state-of-the-art molecular generative model to a wide variety of potential users is important.
In this paper, we will present on our web application (webapp) – based molecular design platform that has practical advantages in terms of speed and diversity of molecule generation, interpretability of model, no requirement of pre-training. The user interface (UI) is designed so to make the tool accessible for experimental chemists.
2. Method
The main workflow is composed of four steps. (1) Training a model: the system encodes input molecular structures to a set of feature vectors by graph kernel approaches. Those feature vectors are used to build a regression model to predict target properties. (2) Setting target properties: a user sets target properties values that should be satisfied by new molecules. (3) Structure generation: molecular graphs are built by repeatedly connecting graph resources, which consists of atoms, rings, and user-defined substructures, while avoiding isomorphic duplications. (4) Online evaluation : during running the graph generation algorithm, each generated structure is encoded to a feature vector and then evaluated on the regression model. If the predicted property satisfies the user-set target value, the structure is accepted and stored in the candidate list. Repeating the above process (3) and (4), molecular structures satisfying target properties are generated.
3. Results
First, we used a small subset (300 sample) of QM9 dataset to evaluate the performance. We trained a property prediction model on the energy of the highest occupied molecular orbital (HOMO); E_homo. Carrying out property prediction by changing models and sweeping hyper parameters, the system produced reasonable accuracy when using kernel ridge regression. We use this model for the following molecular generation process. In structure generation, we used three target properties; E_homo ~ -0.28 (Ha), -0.25 (Ha), and -0.20 (Ha). Depending on the target E_homo value, the generation speed (i.e. the number of molecules generated per second) ranges from 29 to 77 per sec. The speed of other state-of-the-art molecular generative models is typically 1 to 10 per sec when QM9 is used, that confirms our method’s significant acceleration in terms of speed.
For more practical use, we design new photoacid generator (PAG). We trained our model with ~1,300 PAG structures extracted from U.S. patents, and corresponding property data which we calculated by DFT simulation. We targeted five properties; LogW, LogP, T_bio, LD50, and lamda_max, satisfying specific ranges set by a PAG chemist. After running MolGX for 6 hours, more than 2,000 molecular structures were generated, that is more than 100 time of acceleration in design speed. The designed PAG structures were experimentally synthesized.
4. Web Application
We implemented the algorithms as a GUI-based web application for a wide variety of users especially experimental chemists. We designed the workflow from a viewpoint of user experience (UX), developed API set, and implemented it on a Kubernetes-based cloud environment. The system is today in-service on IBM Cloud.