MRS Meetings and Events

 

MD01.08.07 2023 MRS Spring Meeting

Creating Knowledge Maps from Literature to Accelerate Catalytic CO2 Conversion Development

When and Where

Apr 13, 2023
9:30am - 9:45am

Marriott Marquis, Second Level, Foothill C

Presenter

Co-Author(s)

Anna Hiszpanski1,Juanita Ordonez1,Aditya Prajapati1,Huiyun Jeong1,David Buttler1

Lawrence Livermore National Laboratory1

Abstract

Anna Hiszpanski1,Juanita Ordonez1,Aditya Prajapati1,Huiyun Jeong1,David Buttler1

Lawrence Livermore National Laboratory1
Natural language processing (NLP) is a useful tool for extracting and organizing information from scientific papers, thereby expediting the creation of materials science databases for downstream machine learning. However, less well-explored in the materials science domain is the use of NLP to create knowledge maps – rather than databases – which requires extracting overarching concepts and relationships from documents rather than specific detailed information. Such “knowledge maps” provide better organization, searching, and querying of materials science document sets of interests and can be used as a pedagogical tool, enabling a holistic view of the field and creating non-intuitive connections. We have developed NLP tools for the creation of such a conceptual map for the field of catalytically driven CO<sub>2</sub> conversion. We focus on this field given its global importance and the fact that it has experienced a high volume and diversity of publications in recent years, making it challenging for subject matter experts and newcomers to the field alike to keep up with the literature. Our corpus consists of ~37.7k abstracts and ~9.6k full-text articles broadly pertaining to CO<sub>2</sub> conversion. The key tool to create such conceptual mappings is creating document encoders that accurately capture and represent the concepts contained in papers. For our document encoder, we evaluated several transformer models, including MatBERT, MatSciBERT, and ROBERTa. For each, we analyzed the effect that encoding whole versus partial documents and the order in which text is encoded (randomly or in order as it’s written) can have. Our findings show that the MatBERT transformer with the use of more text sampled in a random order tends to best encode documents and capture the concepts contained within them. As an example, this model is able to classify the CO<sub>2</sub> catalytic literature according to the type of catalytic reaction – photocatalytic, electrochemical, or thermohydrogenation – with up to 83% accuracy.<br/> <br/>This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Symposium Organizers

Mathieu Bauchy, University of California, Los Angeles
Ekin Dogus Cubuk, Google
Grace Gu, University of California, Berkeley
N M Anoop Krishnan, Indian Institute of Technology Delhi

Symposium Support

Bronze
Patterns and Matter, Cell Press

Publishing Alliance

MRS publishes with Springer Nature