Dec 6, 2024
9:00am - 9:15am
Hynes, Level 2, Room 210
Joonhyuk Choi1,Youngchun Kwon1
Samsung Advanced Institute of Technology1
Joonhyuk Choi1,Youngchun Kwon1
Samsung Advanced Institute of Technology1
It has been shown that artificial intelligence (AI) models are quite successful to predict chemical properties of molecules and it is crucial to choose a proper molecular representation such as SMILES and a graph in order to improve the performance of the AI models. Recently, graph neural networks (GNNs) have demonstrated superior performance on predicting chemical properties of given molecules and they become prevailing molecular representations. Due to the high dimensionality for data representation with graphs, however, as the number of data points that we need to deal with increases, it demands very large calculation resources and it becomes quite difficult to manage the graph representations to proceed training processes for the AI models. Furthermore, the scarcity of available datasets to predict chemical properties such as retention time of molecules makes development of accurate chemical prediction AI models challenging. To address the first scalability issue with graph-based molecular representations, we propose a sparsified graph representation that regards only heavy atoms in a molecule as nodes and chemical bonds as edges. We show that our proposed representation with an improved message passing and readout functions in a GNN is more scalable to large molecules and provides higher prediction accuracy for NMR chemical shift than generally used graph-based methods. In order to overcome the scarcity problem of training datasets for making an accurate model that predicts retention time for small molecules, we present an improved transfer learning method that learns from a small training data set with a pre-trained GNN. The GNN is pre-trained on the METLIN-SMRT data set and then is fine-tuned on the target training data set for a fixed number of iterations using the limited memory Broyden-Fletcher-Goldfarb-Shanno optimizer. We demonstrate that this proposed method provides better prediction accuracy on numerous chromatographic systems than existing other transfer methods.