December 1 - 6, 2024
Boston, Massachusetts
Symposium Supporters
2024 MRS Fall Meeting & Exhibit
MT04.09.23

Machine Learning-Driven Prediction of SARS-CoV-2 Spike Protein Properties Under Varying Temperature and pH Conditions

When and Where

Dec 4, 2024
8:00pm - 10:00pm
Hynes, Level 1, Hall A

Presenter(s)

Co-Author(s)

Marissa Huang1,Ziyuan Niu2,Georgios Kementzidis2,Yuefan Deng2

Woodbridge High School1,Stony Brook University, The State University of New York2

Abstract

Marissa Huang1,Ziyuan Niu2,Georgios Kementzidis2,Yuefan Deng2

Woodbridge High School1,Stony Brook University, The State University of New York2
The spike glycoprotein (S-protein) of SARS-CoV-2 plays a critical role in viral infection, facilitating the virus’s entry into host cells. As such, understanding the structural stability and denaturation mechanisms of the S-protein under different environmental conditions is crucial for developing strategies to inhibit viral infectivity. Molecular dynamics (MD) simulations are frequently used to model nanoscale interactions, such as those of the S-protein [1]. However, these simulations are computationally intensive and require significant time and processing power, underscoring the need for more efficient methods [2].<br/><br/>To address this, we aim to use machine learning (ML) models trained on MD simulation data to predict properties, such as stability, of the S-protein under varying temperatures and pH levels by examining three measurements: the backbone root-mean-square deviation (RMSD), solvent-accessible surface area (SASA), and protein-water hydrogen-bonds (HBPW). RMSD measures the deviation between the protein’s backbone atoms and the initial structure. SASA is a measure of the surface area of the protein accessible to the solvent. HBPW quantifies the number of hydrogen bonds between the protein and water molecules.<br/><br/>MD simulations were performed using GROMACS with the CHARMM36 force field. The initial S-protein structure (PDB: 6VXX) was retrieved from the Protein Data Bank, with missing loops in its structure reconstructed using Robetta. The S-protein, which consisted of 1273 residues per chain, was placed in an explicit solvent. The cubic simulation box used for the SPC/E water models had dimensions 21×21×21 nm<sup>3</sup>, and periodic boundary conditions were applied in all three Cartesian dimensions.<br/><br/>A Conditional Variational Autoencoder with a Wasserstein Generative Adversarial Network with Gradient Penalty was developed for each of the three properties and trained on 200 ns of MD simulation data. The model’s performance was evaluated using probability density functions showing the predicted and actual distributions, Jensen-Shannon (JS) divergence values bounded between 0 and 1, and plots of the loss values over the epochs.<br/><br/>The model was able to generate predictions for data it had not previously seen, achieving JS divergence values of less than 0.02 for the SASA data and less than 0.1 for the HBPW and RMSD data. These predictions were plotted as probability density functions and found to resemble the distributions of the actual data. This demonstrates a good start that we plan to refine through further strategies such as revising our architecture.<br/><br/>Our results can enhance viral protein dynamic studies by reducing the need for extensive MD simulations, potentially speeding up simulation efforts many times over and minimizing the amount of computational power needed in such large-scale, long-term modeling. This can aid the development of antiviral drugs targeting the S-protein and its stability, leading to more effective COVID-19 treatments. Furthermore, our approach can also be adapted to other viral protein studies to enhance drug discovery processes.<br/><br/>This project is supported by the Louis Morin Charitable Trust. The authors would also like to thank Stony Brook Research Computing and Cyberinfrastructure and the Institute for Advanced Computational Science at Stony Brook University for access to the high-performance SeaWulf computing system.<br/><br/><sup>1 </sup>Hollingsworth, S. A., & Dror, R. O. (2018). Molecular Dynamics Simulation for All. <i>Neuron, 99</i>(6), 1129–1143. https://doi.org/10.1016/j.neuron.2018.08.011<br/><sup>2 </sup>Vlachakis, D., Bencurova, E., Papangelopoulos, N., & Kossida, S. (2014). Current State-of-the-Art Molecular Dynamics Methods and Applications. <i>Advances in Protein Chemistry and Structural Biology</i>, <i>94</i>, 269–313. https://doi.org/10.1016/b978-0-12-800168-4.00007-x

Keywords

COVID-19

Symposium Organizers

Kjell Jorner, ETH Zurich
Jian Lin, University of Missouri-Columbia
Daniel Tabor, Texas A&M University
Dmitry Zubarev, IBM

Session Chairs

Kjell Jorner
Jian Lin
Dmitry Zubarev

In this Session