December 1 - 6, 2024
Boston, Massachusetts
Symposium Supporters
2024 MRS Fall Meeting & Exhibit
BI01.08.02

PALIRS—Python-Based Active Learning for InfraRed Spectroscopy

When and Where

Dec 4, 2024
8:45am - 9:00am
Sheraton, Second Floor, Constitution B

Presenter(s)

Co-Author(s)

Nitik Bhatia1,Ondrej Krejci1,Patrick Rinke1,2

Aalto University1,Technische Universität München2

Abstract

Nitik Bhatia1,Ondrej Krejci1,Patrick Rinke1,2

Aalto University1,Technische Universität München2
Vibrational spectroscopy -- a pivotal analytical tool -- provides real-time molecular insight into catalytic processes, enhancing our understanding of reaction mechanisms and catalyst performance. However, interpreting experimental spectra is challenging. Simulated spectra can aid in the interpretation, but accurate computational methods, in particular quantum mechanical ones like density-functional theory (DFT), are computationally expensive. Machine-learning inter-atomic potentials (MLIPs) [1,2] can be trained on DFT data and have shown promising infrared spectra (IR) predictions at the expense of high requirements for training data.<br/>Our work addresses these data requirements by developing PALIRS, a Python-based Active Learning method for InfraRed Spectroscopy. We utilize it to build a comprehensive database for accurate IR prediction of more than 20 small organic molecules (containing H, C, N and O, with a carbon count of ≤ 2), pivotal in catalysis. Our strategy for accurately predicting IR spectra involves four key steps. In step 1, we employ the atomic simulation environment (ASE) [3] FHI-Aims calculator for initial data sampling, computing the normal modes of our target systems [4]. Subsequently, in step 2, the ASE FHI-Aims calculator is utilized to perform DFT calculations on the structure from step 1. In step 3, our active learning strategy, PALIRS, is employed to efficiently expands the initial dataset by generating high-quality training data for precise IR spectra prediction. The process involves training an ensemble of Message-Passing Neural Network (MACE) [2] models. This is followed by performing molecular dynamics simulations using the trained ML models across organic molecules and single-point DFT calculations for geometries with a high uncertainty estimate associated with the ML model predictions. Step 3 is iterated until the estimated error of predicted energy and forces falls below a given threshold or reaches a maximum of 40 iterations. Finally, the data generated by PALIRS is used to train a MACE model for predicting dipole moments, which in turn aids in the accurate prediction of IR spectra. All parts of PALIRS are parallelized, for an effective usage of computational resources and for faster creation of bigger datasets.<br/>The final MACE model, developed through our active learning loop, demonstrates remarkable accuracy for our dataset of organic molecules, achieving mean absolute errors below 3 meV for energies, 4 meV/Å for forces, and 4 mDebye for dipole moments. For IR spectra prediction, our model achieves a peak position accuracy of better than 20 cm<sup>-1</sup> with just 700-800 of training data for a single molecule, compared to the 10,000 required previously [1]. This not only accelerates the process of IR spectra prediction but also enhances its precision significantly. In conclusion, our approach yields highly accurate MLIPs for simple organic molecules and an equally precise dipole potential model, underscoring the robustness and efficiency of our methodology.<br/><br/>[1] Michael Gastegger, Kristof T. Schütt, and Klaus-Robert Müller. Machine learning of solvent effects on molecular spectra and reactions. Chemical Science, 12(34):11473–11483, September 2021.<br/>[2] Ilyes Batatia, David Peter Kovacs, Gregor N. C. Simm, Christoph Ortner, and Gabor Csanyi. MACE: Higher order equivariant message passing neural networks for fast and accurate force fields. Advances in Neural Information Processing Systems, 2022.<br/>[3] Ask Hjorth Larsen et al. The atomic simulation environment—a python library for working with atoms. Journal of Physics: Condensed Matter, 29(27):273002, 2017.<br/>[4] Zeyuan Tang, Stefan T. Bromley, and Bjørk Hammer. A machine learning potential for simulating infrared spectra of nanosilicate clusters. The Journal of Chemical Physics, 158(22):224108, June 2023.

Keywords

infrared (IR) spectroscopy

Symposium Organizers

Deepak Kamal, Solvay Inc
Christopher Kuenneth, University of Bayreuth
Antonia Statt, University of Illinois
Milica Todorović, University of Turku

Session Chairs

Lihua Chen
Christopher Kuenneth

In this Session