Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions

When and Where

Dec 3, 2024
8:00pm - 10:00pm

Hynes, Level 1, Hall A

Presenter(s)

Hongchen Wang

Kangming Li

Scott Ramsay

Yao Fehlis

Edward Kim

Jason Hattrick-Simpers

Co-Author(s)

Hongchen Wang¹,Kangming Li¹,Scott Ramsay¹,Yao Fehlis¹,Edward Kim¹,Jason Hattrick-Simpers¹

University of Toronto¹

Abstract

Hongchen Wang¹,Kangming Li¹,Scott Ramsay¹,Yao Fehlis¹,Edward Kim¹,Jason Hattrick-Simpers¹

University of Toronto¹

Large Language Models (LLMs) have the potential to revolutionize scientific research, yet their robustness and reliability in domain-specific applications remain insufficiently explored. This study conducts a comprehensive evaluation and robustness analysis of LLMs within the field of materials science, focusing on domain-specific question answering and materials property prediction. Three distinct datasets are used in this study: 1) a set of multiple-choice questions from undergraduate-level materials science courses, 2) a dataset including various steel compositions and yield strengths, and 3) a band gap dataset, containing textual descriptions of material crystal structures and band gap values. The performance of LLMs is assessed using various prompting strategies, including zero-shot chain-of-thought, expert prompting, and few-shot in-context learning. The robustness of these models is tested against various forms of ‘noise’, ranging from realistic disturbances to intentionally adversarial manipulations, to evaluate their resilience and reliability under real-world conditions. Additionally, the study uncovers unique phenomena of LLMs during predictive tasks, such as mode collapse behavior when the proximity of prompt examples is altered and performance enhancement from train/test mismatch. The findings aim to provide informed skepticism for the broad use of LLMs in materials science and to inspire advancements that enhance their robustness and reliability for practical applications.

Keywords

chemical composition

Symposium Organizers

Deepak Kamal, Syensqo

Christopher Kuenneth, University of Bayreuth

Antonia Statt, University of Illinois

Milica Todorović, University of Turku

Symposium Support

Bronze
Matter

Session Chairs

Deepak Kamal

Christopher Kuenneth

Milica Todorović

Symposium Supporters

2024 MRS Fall Meeting & Exhibit