April 7 - 11, 2025
Seattle, Washington
Symposium Supporters
2025 MRS Spring Meeting & Exhibit
MT03.08.03

Automated Construction of Phase Change Material Databases Using Novel Graph Data Recognition Techniques

When and Where

Apr 10, 2025
5:00pm - 7:00pm
Summit, Level 2, Flex Hall C

Presenter(s)

Co-Author(s)

Young Ok Cha1,Yang Hao1

Queen Mary University of London1

Abstract

Young Ok Cha1,Yang Hao1

Queen Mary University of London1
In materials science, the development of comprehensive databases is essential for understanding the properties of existing materials and driving the discovery of new ones. However, the rapid expansion of scientific literature presents a significant challenge for manual data collection required for database creation. To tackle this, researchers are increasingly adopting advanced Natural Language Processing (NLP) techniques, including Large Language Models (LLMs), to automate data extraction from literature. Despite these advancements, many existing methods focus primarily on extracting information from textual and tabular data, often overlooking the valuable insights found in graphical representations. Additionally, the extraction of text and tabular data is further hindered by a lack of standardisation across sources and diverse writing styles, leading to inconsistencies and reduced accuracy.
Publishers typically present graphs in image formats that are not directly interpretable by machines, limiting the extraction of information from these critical visual elements. To address this challenge, we propose a novel pipeline that automates the digitisation of figures, allowing the extraction of key information from both graphical and textual data. This approach combines advanced image processing techniques with the capabilities of LLMs, enabling comprehensive and accurate data extraction. Our research focuses on Metal-Insulator Transition (MIT) Phase Change Materials (PCMs), a class of materials known for their tunability and broad potential applications, such as in data storage and energy efficiency technologies.
The primary goal of this study is to develop a comprehensive database that captures crucial information from peer-reviewed literature, facilitating the identification of optimal parameter combinations for PCMs. This database will accelerate the discovery and optimisation of materials with desirable properties. Our pipeline automatically identifies relationships between host materials and dopants from textual data, enriching the database with contextual information that goes beyond numerical data alone. Furthermore, we successfully extracted key material properties—such as transition temperature, reflectivity, and bandgap—from graphical data in the literature. This information, carefully standardised from both textual and graphical sources, is integrated into a structured and reliable database.
The database construction process consolidates all extracted data into a cohesive and accessible format, ensuring consistency, accuracy, and relevance. This database not only supports further research and analysis, but also serves as a valuable input for machine learning models aimed at predicting novel characteristics and discovering new PCMs. By automating the extraction of both textual and graphical data, our research significantly enhances the efficiency of data collection and contributes to the broader goal of accelerating material discovery and innovation in the field of PCMs.

Symposium Organizers

Qian Yang, University of Connecticut
Tuan Anh Pham, Lawrence Livermore National Laboratory
Victor Fung, Georgia Institute of Technology
James Chapman, Boston University

Session Chairs

James Chapman
Victor Fung
Tuan Anh Pham
Qian Yang

In this Session