Dec 4, 2024
8:00pm - 10:00pm
Hynes, Level 1, Hall A
Audrey Huang1,2,Karin Hasegawa2,Yuefan Deng2
Woodbridge High School1,Stony Brook University, The State University of New York2
Audrey Huang1,2,Karin Hasegawa2,Yuefan Deng2
Woodbridge High School1,Stony Brook University, The State University of New York2
Eukaryotic translation initiation factor 4E (eIF4E) plays a critical role in regulating protein synthesis, with its overexpression particularly implicated in drug-resistant estrogen receptor-positive (ER+) breast cancer, a form of cancer known to develop resistance through a high number of different aberrations. Instead of focusing solely on the identification of therapeutic agents, to which the cancer has yet to develop resistance, a more viable strategy may involve targeting fundamental processes that support tumor survival and progression, making eIF4E a promising target for drug development [1]. However, despite significant advances in understanding of eIF4E’s structure and function, developing effective inhibitors remains challenging. To address this issue, we aim to identify novel inhibitors through AI-driven de novo drug discovery, building on previous frameworks [2].<br/><br/>Two comprehensive datasets from the ChEMBL database were prepared and preprocessed: a general dataset of 2,409,270 molecules and a targeted dataset of 288 reported eIF4E inhibitors. Data augmentation was performed to increase the size of the targeted dataset. An autoencoder was then used to convert the discrete SELFIES representations of molecules into latent vectors, creating the continuous representations necessary for the generative AI algorithms. Using the latent vectors, Neural Networks and Random Forests were trained as QSAR models to predict the pIC50, MW, LogP, SAS, eIF4E1 binding affinity, eIF4E2 binding affinity, and eIF4E3 binding affinity of compounds. The binding affinity training data was created by performing molecular docking using Autodock Vina of known eIF4E inhibitors on the structures of eIF4E1, eIF4E2, and eIF4E3 retrieved from the Protein Data Bank. The Neural Networks outperformed the Random Forests across all metrics in terms of RMSE and were used in the rest of the AI framework.<br/><br/>Transfer learning was performed to train a Wasserstein GAN with Gradient Penalty (WGAN-GP), a WGAN-GP paired with a Genetic Algorithm (GA), and a WGAN-GP paired with a Pareto-based Genetic Algorithm (PGA). The Genetic Algorithms, which performed selection operations to enhance the training data’s fitness, improved the properties of the generated molecules. In addition, dominance resistant solutions were periodically removed in the case of the PGA, ensuring solutions far away from the Pareto Front yet hard to dominate were not left in the training set. After the WGAN-GP models were trained, they were used to generate compounds, which were evaluated for validity, uniqueness, novelty, diversity, and their molecular properties. Finally, the generated compounds underwent molecular docking simulations to assess their binding stability and interactions with eIF4E.<br/><br/>The findings of this study not only advance understanding of eIF4E's role in cancer but also pave the way for the development of targeted therapies for drug-resistant ER+ breast cancer. Future studies should focus on the experimental validation of the generated compounds and explore their therapeutic efficacy in later steps of the drug discovery pipeline.<br/><br/>This project is supported by the Louis Morin Charitable Trust. Furthermore, the authors would like to thank Stony Brook Research Computing and Cyberinfrastructure and the Institute for Advanced Computational Science at Stony Brook University for access to the high-performance SeaWulf computing system.<br/><br/>[1] Jia, Yan et al. “Cap-dependent translation initiation factor eIF4E: an emerging anticancer drug target.” Medicinal research reviews vol. 32,4 (2012): 786-814. doi:10.1002/med.21260<br/>[2] Xie, Evan et al. "An AI-Driven Framework for Discovery of BACE1 Inhibitors for Alzheimer’s Disease." bioRxiv (2024) https://doi.org/10.1101/2024.05.15.594361