December 1 - 6, 2024
Boston, Massachusetts
Symposium Supporters
2024 MRS Fall Meeting & Exhibit
MT04.09.12

Investigation of Machine Learning Force Fields for Biomolecular Systems Using Fragment Molecular Orbital Method Data

When and Where

Dec 4, 2024
8:00pm - 10:00pm
Hynes, Level 1, Hall A

Presenter(s)

Co-Author(s)

Hiromu Matsumoto1,Ryosuke Kita1,Chiduru Watanabe2,Masateru Ohta2,Naoki Tanimura3,Koji Okuwaki4,Yu-Shi Tian5,Daisuke Takaya5,Mitsunori Ikeguchi2,6,Kaori Fukuzawa5,Teruki Honma2,Tsuyohiko Fujigaya1,Koichiro Kato1

Kyushu University1,RIKEN2,Mizuho Research & Technologies3,JSOL Corporation4,Osaka University5,Yokohama City University6

Abstract

Hiromu Matsumoto1,Ryosuke Kita1,Chiduru Watanabe2,Masateru Ohta2,Naoki Tanimura3,Koji Okuwaki4,Yu-Shi Tian5,Daisuke Takaya5,Mitsunori Ikeguchi2,6,Kaori Fukuzawa5,Teruki Honma2,Tsuyohiko Fujigaya1,Koichiro Kato1

Kyushu University1,RIKEN2,Mizuho Research & Technologies3,JSOL Corporation4,Osaka University5,Yokohama City University6
<b>Introduction</b><br/>In molecular simulations for drug discovery, achieving both high accuracy and low computational cost is crucial. Unlike traditional molecular force fields and quantum mechanical (QM) calculations, machine learning force fields (MLFFs) are expected to meet these demands effectively. Previous approaches to developing MLFFs have relied on QM-based datasets derived from conventional density functional theory (DFT) or ab initio molecular orbital methods. However, the significant computational costs associated with these methods, particularly for large systems such as biomolecules, have considerably restricted the scope of MLFF research in drug discovery. We considered that the Fragment Molecular Orbital (FMO) method<sup>1)</sup>, which offers efficient QM calculations for entire biomolecular systems, could address this issue. In this study, we investigate whether FMO data can be effectively used to construct MLFFs. Furthermore, we explore the use of the FMO Database (FMODB) to enhance MLFF accuracy through transfer learning.<br/><br/><b>Methods</b><br/>To evaluate the utility of FMO data in constructing MLFFs, TrpCage, a small protein consisting of 20 residues, was selected as the model system. Additionally, the effectiveness of transfer learning to improve the accuracy of MLFFs was investigated by utilizing the FMODB, a comprehensive public database of FMO calculation results.<br/>For the MLFF training dataset, diverse configurations of TrpCage were sampled through molecular dynamics (MD) simulations. The potential energies and forces acting on each atom for each structure were computed using the FMO method. MD simulations of the TrpCage NMR structure (PDBID: 1L2Y) in water were conducted using GROMACS software with the Amber ff14SB force field and the TIP3P water model. A total of 5,000 structures were obtained from these simulations, sampled every 1 ns over 50 runs of 100 ns each. Subsequent FMO calculations (FMO2-MP2/6-31G* with energy gradient) were performed to evaluate the energy and forces on each structure using the ABINIT-MP program on the Fugaku supercomputer (hp230131). The dataset was divided into training, validation, and test sets in a ratio of 8:1:1 ratio. The MLFF was constructed based on the High Dimensional Neural Network Potential (HDNNP) framework proposed by Behler and Parrinello<sup>2)</sup>.<br/>Additionally, a pre-trained model for transfer learning was developed using 15,454 energy records from FMODB, including atomic species C, H, N, O, S, F, and Cl.<br/><br/><b>Results</b><br/><u>Without Transfer Learning</u><br/>The initial correlation coefficient (<i>R)</i> values for the prediction of TrpCage's energy and forces without transfer learning were 0.58 and 0.70, respectively. These results indicated that the constructed MLFF could learn the relationship between structure and force/energy from FMO data, but the prediction accuracy remained moderate.<br/><u>With Transfer Learning</u><br/>The prediction results for TrpCage using transfer learning from FMODB data showed improvements. The <i>R</i> values for energy and force predictions increased to 0.61 and 0.73, respectively. These improvements demonstrated the effectiveness of using large scale pre-training datasets to improve the accuracy of MLFFs.<br/><br/><b>Acknowledgments</b><br/>This research was conducted as part of the Life Intelligence Consortium (LINC) and the FMO Drug Design Consortium (FMODD). The work was supported in part by the Japan Agency for Medical Research and Development (AMED) under the Drug Discovery and Life Science Research Support Platform Project (BINDS) (Grant No. JP23ama121030).<br/><br/><b>References</b><br/>1) Kitaura, K. et al., Chem Phys Lett 313, 701–706 (1999).<br/>2) Behler, J. & Parrinello, M., Phys Rev Lett 98, 146401 (2006).

Symposium Organizers

Kjell Jorner, ETH Zurich
Jian Lin, University of Missouri-Columbia
Daniel Tabor, Texas A&M University
Dmitry Zubarev, IBM

Session Chairs

Kjell Jorner
Jian Lin
Dmitry Zubarev

In this Session