December 1 - 6, 2024
Boston, Massachusetts

Event Supporters

2024 MRS Fall Meeting & Exhibit
BI01.01.03

Leveraging Large Language Models for Automated Materials Database Curation

When and Where

Dec 2, 2024
11:15am - 11:30am
Sheraton, Second Floor, Constitution B

Presenter(s)

Co-Author(s)

Tyler Sours1,Maciej Polak1,2,Omar Allam1,Shivang Agarwal1,Steffen Ridderbusch1,Dane Morgan2,Ang Xiao1

SandboxAQ1,University of Wisconsin–Madison2

Abstract

Tyler Sours1,Maciej Polak1,2,Omar Allam1,Shivang Agarwal1,Steffen Ridderbusch1,Dane Morgan2,Ang Xiao1

SandboxAQ1,University of Wisconsin–Madison2
The advancement of machine learning (ML) models in materials science heavily relies on the availability of large, high-quality datasets. While open-access datasets exist, they often suffer from limitations such as incomplete data, lack of standardization, and insufficient coverage of diverse material properties. Therefore, harnessing the comprehensive and detailed information available in scientific literature becomes highly appealing. By leveraging Large Language Models (LLMs) for data extraction and programmatically querying extensive databases of scientific literature, we can create robust, standardized datasets that address these limitations. This automated process significantly reduces the time and effort required for data collection, allowing computational researchers to focus on data analysis and model development using real data that is representative of the scientific community at large. While these methodologies are domain agnostic, we demonstrate their application to several areas of interest in materials science, with a focus on alloy mechanical properties and battery stability data. We illustrate the utility of these comprehensive datasets by training ML models to perform downstream predictive tasks and guide material design, thereby accelerating discovery and innovation. By integrating diverse data sources, our approach ensures a rich and holistic representation of the current state of knowledge, enhancing the predictive capabilities of ML models and leading to faster development of better materials.

Symposium Organizers

Deepak Kamal, Solvay Inc
Christopher Kuenneth, University of Bayreuth
Antonia Statt, University of Illinois
Milica Todorović, University of Turku

Session Chairs

Christopher Kuenneth
Milica Todorović

In this Session