Question Answering Models for Information Extraction from Perovskite Materials Science Literature

When and Where

Dec 2, 2024
11:30am - 11:45am

Sheraton, Second Floor, Constitution B

Presenter(s)

Matilda Sipilä

Farrokh Mehryary

Sampo Pyysalo

Filip Ginter

Milica Todorović

Co-Author(s)

Matilda Sipilä¹,Farrokh Mehryary¹,Sampo Pyysalo¹,Filip Ginter¹,Milica Todorović¹

University of Turku¹

Abstract

Matilda Sipilä¹,Farrokh Mehryary¹,Sampo Pyysalo¹,Filip Ginter¹,Milica Todorović¹

University of Turku¹

Scientific text is a promising source of data in materials science, and there is ongoing research on how to utilize textual data in materials discovery. The recent success of transformer-based language models has led to the development of new machine learning tools. These tools, such as question answering (QA), are now available for information extraction (IE) from scientific literature. The QA models are large language (BERT) models tuned towards an IE task, conducted by asking a comprehensible question. The potential of the QA method lies in its versatility, accessibility and scalability. Human language queries make it easy to use even for researchers with no previous knowledge of language technology. Also, no re-training of QA model is needed to extract information about different materials and properties.

We explored the IE performance of the QA method on the task of extracting bandgap values of halide perovskite materials from scientific literature. We tested five different BERT models and found that MatBERT model produced the best results. Compared to the more established IE tool ChemDataExtractor2, the QA method performed well, and we were able to collect correct bandgap values from text. Extracted information will next be used to map the space of materials properties and find promising new materials solutions. We implemented this method into a web application to make the QA tool more widely available. Through this work, we seek to lower the barriers for non-experts to use large language models for IE and help democratize use of language technology in materials research.

Keywords

perovskites

Symposium Organizers

Deepak Kamal, Syensqo

Christopher Kuenneth, University of Bayreuth

Antonia Statt, University of Illinois

Milica Todorović, University of Turku

Symposium Support

Bronze
Matter

Session Chairs

Christopher Kuenneth