MRS Meetings and Events

 

DS05.04.05 2023 MRS Fall Meeting

Large Language Model-Based Pipeline for Extraction of Polymer Property Records

When and Where

Nov 28, 2023
11:15am - 11:30am

Sheraton, Third Floor, Gardner

Presenter

Co-Author(s)

Sonakshi Gupta1,Pranav Shetty1,Aishat Adeboye1,Rampi Ramprasad1

Georgia Institute of Technology1

Abstract

Sonakshi Gupta1,Pranav Shetty1,Aishat Adeboye1,Rampi Ramprasad1

Georgia Institute of Technology1
Polymer informatics has made great strides in recent years in predicting polymer properties and designing new materials. These data-driven models are powered by curated data and require painstaking manual curation often from the rapidly growing corpus of journal articles. Data curators and materials scientists who search for material property information from this growing body of literature face an uphill task.<br/><br/>In this work, we present a pipeline that leverages large language models to extract material property information from the text of journal articles. We frame the problem as a text-completion problem by inputting the text containing material property data and a prompt with the relevant instructions to the GPT3.5 model accessed through the OpenAI API. An example prompt looks like ‘Extract all bandgap values from the following text in json format: ...’. The output produced by the language model for this prompt is the tuple of material and property value as a dictionary. We use the paradigm of few-shot prompting wherein a few representative examples are selected and input output pairs are provided as a prompt to the model. This specifies a format for the data to be extracted and increases extraction performance. We benchmarked our method on two datasets of abstracts containing polymer glass transition temperature and bandgap respectively and show that this method outperforms information extraction using fully supervised methods using named entity recognition and heuristic rules for relation extraction. The resulting method was then applied to a corpus of 2.6 million materials science articles to extract all polymer glass transition temperature and bandgap values recorded therein.

Symposium Organizers

Debra Audus, National Institute of Standards and Technology
Deepak Kamal, Solvay Inc
Christopher Kuenneth, University of Bayreuth
Lihua Chen, Schrödinger, Inc.

Symposium Support

Gold
Solvay

Publishing Alliance

MRS publishes with Springer Nature