Dec 4, 2024
11:15am - 11:30am
Sheraton, Second Floor, Constitution B
Navin Rajapriya1,Kotaro Kawajiri1
AIZOTH America, Inc.1
This abstract introduces Multi-Sigma, a proprietary no-code AI analysis tool designed for multi-objective prediction and optimization. As part of our efforts to make AI more accessible to a broader research community, we have developed a free web application based on Multi-Sigma for screening molecules based on their global warming potential (GWP), an essential parameter in the development of environmentally friendly refrigerants.<br/><br/>The development of AI in science and engineering has progressed rapidly, but its increasing complexity often hinders its adoption in research and development (R&D). To bridge the gap between AI specialists and non-experts, we developed Multi-Sigma: a cloud-based, user-friendly software with a full graphical user interface (GUI), designed to democratize the use of machine learning for R&D.<br/><br/>Multi-Sigma features three core modules: Bayesian analysis, neural network analysis, and chain analysis. Researchers can train AI models with up to 200 explanatory variables and 100 target variables. Multi-Sigma’s patented auto-tuning feature performs hands-free hyperparameter optimization. For experiments or processes with multiple stages, the chain analysis module allows users to link multiple AI models, where the output from one model can serve as the input for the next, facilitating complex multi-stage predictions and optimizations. We leveraged Multi-Sigma’s capabilities to develop a model predicting the 100-year GWP values of greenhouse gases (GHG) and refrigerants using molecular descriptors.<br/><br/>The primary challenge in predicting GWP values lies in the limited availability of experimental data and the continuously evolving nature of GWP values due to varying atmospheric conditions and GHG lifetimes. The 6th assessment report (AR6) from the United Nations' Intergovernmental Panel on Climate Change (IPCC) reports GWP values ranging from zero to 25,200 over a 100-year period, reflecting the wide range and skewed distribution of data. This massive scale and skewed data distribution complicate the development of accurate models. Additionally, the small dataset of 207 samples introduces a significant risk of overfitting during hyperparameter optimization.<br/><br/>To address these challenges, we sought to answer several key questions essential for developing a GWP100 prediction model based on molecular structure:<br/>○ Given the heavy skewness of the data, is log transformation appropriate, or are alternative transformations such as Box-Cox, Yeo-Johnson, or quantile transformations more suitable?<br/>○ Would up-sampling the data help mitigate overfitting in the context of the limited dataset?<br/>○ With multiple available molecular descriptor packages, which numerical representations (e.g., RDKit, Mordred, Alvadesc) are most appropriate for modeling GWP100?<br/><br/>We will leverage the statistical transformations available in Multi-Sigma’s preprocessing module to evaluate and identify the most suitable methods for improving model performance. Multi-Sigma also includes functions for imbalanced data adjustment function, automatically up-sampling minority classes, and a balanced validation extraction function to ensure equal representation during model validation. We compared the performance of AI models using molecular descriptors from RDKit, Mordred, and Alvadesc.<br/><br/>The most accurate model resulted from a combination of Mordred molecular descriptors, quantile transformation, and Multi-Sigma’s balanced validation and imbalanced adjustment functions. The resulting model achieved high accuracy, with an R<sup>2</sup> score of 0.913 on the original scale, outperforming previous scientific reports on GWP prediction.<br/><br/>This highly accurate model is now available through a free web application, allowing users to input individual molecules or lists of molecules in SMILES format to predict their GWP100 values. This tool can facilitate the identification and screening of low-GWP refrigerant candidates, contributing to the development of sustainable, ozone-friendly refrigerants.