MRS Meetings and Events

 

DS06.06.06 2023 MRS Fall Meeting

Machine Learning Approach to Time-Series Analysis of SARS-CoV-2 Spike Glycoproteins under Varying pH and Temperature Conditions

When and Where

Nov 28, 2023
8:00pm - 10:00pm

Hynes, Level 1, Hall A

Presenter

Co-Author(s)

Parth Jain1,2,Melvin Thu3,2,Ziyuan Niu2,Miriam Rafailovich2,Yuefan Deng2

Bergen County Academies1,Stony Brook University, The State University of New York2,Great Neck North High School3

Abstract

Parth Jain1,2,Melvin Thu3,2,Ziyuan Niu2,Miriam Rafailovich2,Yuefan Deng2

Bergen County Academies1,Stony Brook University, The State University of New York2,Great Neck North High School3
As the SARS-CoV-2 pandemic persists, it's vital to evaluate how environmental factors such as temperature and pH affect the virus's molecular structure. Molecular dynamics (MD) simulations are commonly used to study nanoscale interactions. However, their computational complexity limits their suitability for larger-scale, longer-term modeling. Our research focuses on developing a machine learning (ML) approach that leverages supervised training on SASA (solvent accessible surface area) MD simulation data to predict properties of SARS-CoV-2 spike glycoproteins (S-proteins). SASA quantifies the accessible surface area around a protein, which plays a crucial role. Changes in these values over time provide insights into protein stability, folding behavior, and the virus's pathogenicity under specific conditions.[1]<br/><br/>We performed MD simulations of the S-protein using GROMACS. The protein structure (PDB: 6VXX) was modeled with the CHARMM27 force field. The explicit solvent was represented by a 20x20x20 nm3 box (SPC/E water model) at 1.02 g/cm3 density. The simulation was conducted at 37°C, with energy minimization via steepest descent. We employed canonical (NVT) and Parrinello-Rahman pressure coupling (NPT) methods with a 2 fs time step.<br/><br/>We denoised and processed data using Fast Fourier Transforms (FFT) to convert time-series SASA data into a frequency domain, enabling the identification of cyclic patterns. ML models, including k-Nearest Neighbors (k-NN), Long Short-Term Memory Neural Networks (LSTMs), and Convolutional Neural Networks (CNNs), are subsequently trained on 1,500 ns of MD SASA data to predict changes as a function of time. This large dataset allowed us to thoroughly test each model's performance across varying sample sizes. Prior research has indicated that larger datasets enhance the accuracy of the k-NN model.[2] LSTMs were also chosen to provide a contrasting approach to the k-NN and CNN models, which are known for their higher sensitivity in extracting categorical features and capturing patterns in data.[3] We ran the ML models on SASA data of the S-protein at temperatures of 3°C, 20°C, and 37°C and pH values of 1, 2 , 3, 4, and 5, and used an 80-20 train-test split. The k-NN model demonstrated the greatest accuracy in forecasting simulation data out of three models, whereas the LSTM model performed poorly, likely requiring more training data to make accurate predictions. In the future, in vitro experiment results will be able to validate the accuracy of the ML models for long-term predictions and their efficacy as a replacement for computationally expensive MD simulations.<br/><br/>[1] Ali, S., et al. “A review of methods available to estimate solvent-accessible surface areas of soluble proteins in the folded and unfolded states.” Current protein & peptide science vol. 15,5 (2014): 456-76. doi:10.2174/1389203715666140327114232<br/>[2] Liang, D., Song, M., Niu, Z. et al. Supervised machine learning approach to molecular dynamics forecast of SARS-CoV-2 spike glycoproteins at varying temperatures. MRS Advances 6, 362–367 (2021). https://doi.org/10.1557/s43580-021-00021-4<br/>[3] Liew, S., et al. (2016). Gender Classification: A Convolutional Neural Network Approach. Turkish Journal of Electrical Engineering and Computer Sciences. 24. 1248-1264. 10.3906/elk-1311-58.

Keywords

COVID-19

Symposium Organizers

Mathieu Bauchy, University of California, Los Angeles
Ekin Dogus Cubuk, Google
Grace Gu, University of California, Berkeley
N M Anoop Krishnan, Indian Institute of Technology Delhi

Symposium Support

Bronze
Patterns and Matter | Cell Press

Publishing Alliance

MRS publishes with Springer Nature