LLMs for FAIR Materials Data Curation

When and Where

Apr 11, 2025
10:45am - 11:00am

Summit, Level 4, Room 422

Presenter(s)

Catherine Brinson

Defne Circi

Bhuwan Dhingra

Co-Author(s)

Catherine Brinson¹,Defne Circi¹,Bhuwan Dhingra¹

Duke University¹

Abstract

Catherine Brinson¹,Defne Circi¹,Bhuwan Dhingra¹

Duke University¹

Advances in materials science require leveraging past findings and data from the vast published literature. Large language models (LLMs) and vision-language models (VLMs) offer transformative potential to systematically convert unstructured textual, tabular, and graphical information embedded within articles into structured, analyzable formats. Despite their promise, the capability of these models to extract information from hybrid materials science articles, which often include tables alongside text, remains underexplored. Furthermore, the scarcity of annotated datasets, particularly for charts, poses a significant barrier to progress in this domain as they contain the densest information. To address this gap, we introduce an automated framework that evaluates the quality of information extraction from hybrid articles and charts. In addition, we propose benchmark datasets to support and standardize future research. To overcome the challenge of limited data availability for training, we also develop a method for synthetically generating chart datasets. We aim to fine-tune a pretrained image-to-text model on materials science figures with complete and consistent annotations to demonstrate the efficiency of our synthetic data generation.
Our results emphasize the importance of multimodal datasets and benchmarks in advancing the application of LLMs and VLMs for scientific research. By bridging gaps in data accessibility and enabling robust evaluations, this work contributes to the acceleration of materials discovery and highlights the broader potential of LLM-driven knowledge extraction in scientific fields.

Symposium Organizers

Qian Yang, University of Connecticut

Tuan Anh Pham, Lawrence Livermore National Laboratory

Victor Fung, Georgia Institute of Technology

James Chapman, Boston University

Session Chairs

James Chapman

Victor Fung

N M Anoop Krishnan

Symposium Supporters

2025 MRS Spring Meeting & Exhibit