December 1 - 6, 2024
Boston, Massachusetts
Symposium Supporters
2024 MRS Fall Meeting & Exhibit
BI01.01.04

Question Answering Models for Information Extraction from Perovskite Materials Science Literature

When and Where

Dec 2, 2024
11:30am - 11:45am
Sheraton, Second Floor, Constitution B

Presenter(s)

Co-Author(s)

Matilda Sipilä1,Farrokh Mehryary1,Sampo Pyysalo1,Filip Ginter1,Milica Todorović1

University of Turku1

Abstract

Matilda Sipilä1,Farrokh Mehryary1,Sampo Pyysalo1,Filip Ginter1,Milica Todorović1

University of Turku1
Scientific text is a promising source of data in materials science, and there is ongoing research on how to utilize textual data in materials discovery. The recent success of transformer-based language models has led to the development of new machine learning tools. These tools, such as question answering (QA), are now available for information extraction (IE) from scientific literature. The QA models are large language (BERT) models tuned towards an IE task, conducted by asking a comprehensible question. The potential of the QA method lies in its versatility, accessibility and scalability. Human language queries make it easy to use even for researchers with no previous knowledge of language technology. Also, no re-training of QA model is needed to extract information about different materials and properties.<br/><br/>We explored the IE performance of the QA method on the task of extracting bandgap values of halide perovskite materials from scientific literature. We tested five different BERT models and found that MatBERT model produced the best results. Compared to the more established IE tool ChemDataExtractor2, the QA method performed well, and we were able to collect correct bandgap values from text. Extracted information will next be used to map the space of materials properties and find promising new materials solutions. We implemented this method into a web application to make the QA tool more widely available. Through this work, we seek to lower the barriers for non-experts to use large language models for IE and help democratize use of language technology in materials research.

Keywords

perovskites

Symposium Organizers

Deepak Kamal, Solvay Inc
Christopher Kuenneth, University of Bayreuth
Antonia Statt, University of Illinois
Milica Todorović, University of Turku

Session Chairs

Christopher Kuenneth
Milica Todorović

In this Session