ALCHIMIA—Advanced Learning for Chemistry Interpretation and Integrated Molecule Analysis

When and Where

Dec 3, 2024
8:00pm - 10:00pm

Hynes, Level 1, Hall A

Presenter(s)

Emilio Vital Brazil

Eduardo Almeida Soares

Breno Carvalho

Victor Shirasuna

Renato Cerqueira

Co-Author(s)

Emilio Vital Brazil¹,Eduardo Almeida Soares¹,Breno Carvalho¹,Victor Shirasuna¹,Renato Cerqueira¹

IBM Research¹

Abstract

Emilio Vital Brazil¹,Eduardo Almeida Soares¹,Breno Carvalho¹,Victor Shirasuna¹,Renato Cerqueira¹

IBM Research¹

The current application of foundation models (FMs) in industrial chemical problems, such as the generation and prediction of properties of small molecules, has shown promising results [1]. A key advantage of FM technology is the ability to create a single model using a large amount of pre-training data, which can then be adapted for various downstream tasks using smaller datasets [2]. However, the complexity of working with FM technology, which requires specialized knowledge in AI and expensive hardware, makes it difficult for experts in the chemical domain to access and utilize these models [3]. Moreover, the lack of uncertainty characterization in most models limits their practical use [4].
To address these challenges, we propose a comprehensive pipeline that enables material discovery experts to create machine-learning models based on advanced FM technology. Our pipeline and software stack, built using Python, encapsulate FM technology and provide experts with the ability to fine-tune models using state-of-the-art techniques such as adapters [5] and mixture of experts (MoE) [6]. For example, our pipeline allows experts to choose from four different models based on SMILES mixing and fine-tune them using low rank approximation techniques. The entire process is recorded, and uncertainty characterization is calculated for the fine-tuned models.
Our proposed pipeline and software stack aim to make FM technology more accessible to experts in the chemical domain, enabling them to leverage the power of these models for material discovery and other applications. By providing a user-friendly interface and advanced fine-tuning techniques, we hope to democratize the use of FM technology and drive innovation in the field of chemistry.

[1] White, A. D. (2023). The future of chemistry is language. Nature Reviews Chemistry, 7(7), 457-458.
[2] Bommasani, Rishi, et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
[3] Pan, J. (2023). Large language model for molecular chemistry. Nature Computational Science, 3(1), 5-5.
[4] Felicioni, Nicolò, et al. (2024). On the Importance of Uncertainty in Decision-Making with Large Language Models. arXiv preprint arXiv:2404.02649.
[5] Hu,Edward J., et al. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
[6] Shazeer, Noam, et al. (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer." arXiv preprint arXiv:1701.06538.

Symposium Organizers

Deepak Kamal, Syensqo

Christopher Kuenneth, University of Bayreuth

Antonia Statt, University of Illinois

Milica Todorović, University of Turku

Symposium Support

Bronze
Matter

Session Chairs

Deepak Kamal

Christopher Kuenneth

Milica Todorović

Symposium Supporters

2024 MRS Fall Meeting & Exhibit