Dec 4, 2024
8:00pm - 10:00pm
Hynes, Level 1, Hall A
Ashley Jisue Hong1,2,Ziyuan Niu2,Georgios Kementzidis2,Yuefan Deng2
Punahou School1,Stony Brook University, The State University of New York2
Ashley Jisue Hong1,2,Ziyuan Niu2,Georgios Kementzidis2,Yuefan Deng2
Punahou School1,Stony Brook University, The State University of New York2
As the COVID-19 pandemic remains globally without known definitive treatment, studies in SARS-CoV-2 continue to hold importance. A component of SARS-CoV-2 that plays a pivotal role in viral infection is the Spike Glycoprotein (S-protein), facilitating membrane fusion and host cell machinery replication via viral RNA injection<sup><b>1</b></sup>. Understanding the S-protein’s structural mechanisms in diverse environmental conditions is critical for inhibiting infection and developing treatment<sup><b>2</b></sup>. Therefore, this study aims to use Molecular Dynamics (MD) and Machine Learning (ML) to predict the S-protein’s properties better in varying pH and temperatures.<br/>MD models movements of molecules and atoms through computer simulations. Though MD was used to analyze the S-protein, complications arose as MD required significant time and processing power. To increase modeling efficiency, we trained ML models on MD simulation data to predict the S-protein’s stability under varying pH and temperatures using three measurements: root-mean-square deviation (RMSD), solvent-accessible surface area (SASA), and protein-water hydrogen-bonds (HBPW). RMSD assesses stability through deviations between the protein backbone and reference structures. SASA assesses stability through folding dynamics shown by the protein’s surface area accessible to the solvent. HBPW assesses stability by the number of hydrogen bonds between protein and water molecules.<br/>MD simulation was conducted using the open-source software GROMACS with the CHARMM36 force field. The initial structure of the S-protein was obtained from the protein data bank (6VXX.pdb), any missing loops in the 6VXX structure completed using Robetta. The S-protein, comprising 1273 residues per chain, was placed in an explicit solvent. The simulation box for the SPC/E water models measured 21×21×21 nm<sup>3</sup> with periodic boundary conditions applied in all three Cartesian dimensions. Data was recorded for 30 environmental combinations from 5 temperature and 6 pH values.<br/>A Conditional Generative Adversarial Network (CGAN) for each RMSD, SASA, and HBPW was developed as this study’s generative model. Model was programmed in Python and trained on 200 ns of MD simulation data. Model accuracy was summarized through Kernel Density Estimation (KDE) distribution comparison histograms of actual and generated data, generator and discriminator loss as a function of epochs, Mean Squared Error (MSE) loss over epoch, and Jensen–Shannon (JS) divergence of actual and generated data over epoch. JS divergence between two distributions is bounded by 1 and calculated using base 2 logarithms. Model training involved several trials of eliminating extreme environmental conditions and parameter experimentation.<br/>Results showed significance as generated data distributions showed great similarities to actual data distributions. The model was also able to generate data in given conditions with great similarity to the actual data without training. Thus, our results can be applied to further SARS-CoV-2 studies, especially COVID-19 treatment and antiviral drug discovery targeting to disrupt the S-protein’s structural integrity. Our model showed lower JS divergence and faster learning compared to previous Conditional Variational Autoencoders (CVAE) models. Such accelerated running efficiency with the combination of CGAN and MD simulations can reduce the need for extensive MD simulations for other viral protein studies, and broader applications can help drug discoveries beyond SARS-CoV-2.<br/><b><sup>1</sup></b>Huang, Y., Yang, C., Xu, X., Xu, W., & Liu, S. (2020). Structural and functional properties of SARS-COV-2 spike protein: Potential antivirus drug development for covid-19. Acta Pharmacologica Sinica, 41(9), 1141–1149. https://doi.org/10.1038/s41401-020-0485-4<br/><b><sup>2</sup></b>Xie, Y., Guo, W., Lopez-Hernadez, A., Teng, S., & Li, L. (2022). The pH Effects on SARS-CoV and SARS-CoV-2 Spike Proteins in the Process of Binding to hACE2. Pathogens, 11(2), 238. https://doi.org/10.3390/pathogens11020238