MRS Meetings and Events

 

DS03.07.05 2022 MRS Fall Meeting

Trend Analysis and Insight Extractions Using Named Entity Recognition of CO2RR Literature

When and Where

Nov 29, 2022
8:00pm - 10:00pm

Hynes, Level 1, Hall A

Presenter

Co-Author(s)

Jiwoo Choi1,2,Kihoon Bang1,Suji Jang1,Kwang-Ryeol Lee1,Sang Soo Han1,Donghun Kim1

Korea Institute of Science and Technology1,Korea University2

Abstract

Jiwoo Choi1,2,Kihoon Bang1,Suji Jang1,Kwang-Ryeol Lee1,Sang Soo Han1,Donghun Kim1

Korea Institute of Science and Technology1,Korea University2
In recent years, big data and artificial intelligence have penetrated materials science research. Currently, most openly available material databases use results derived from computer simulations and not from experiments. Some examples of materials research projects include the Materials Project, Novel Materials Discovery (NOMAD), and Open Quantum Materials Database (OQMD). Unfortunately, it is still difficult to build a large-scale experimental materials database. In this context, the scientific literature is one of the underutilized potential data sources because it contains well-organized experimental data that is easily accessible. An intensive study of natural language processing (NLP) of a huge volume of literature in materials science is required. Data can be automatically extracted from literature using NLP.<br/> Among various research topics in materials science, CO<sub>2</sub> reduction reaction (CO<sub>2</sub>RR) catalysis would be an interesting topic to apply NLP. CO<sub>2</sub>RR catalysis, a conversion process from carbon dioxides into valuable compounds, would alleviate today’s energy crises and environmental problems. Although a large volume of CO<sub>2</sub>RR studies have been performed, however the experimental databases have not yet been built. We aim to build a large scale experimental databases using a variety of NLP techniques, and also aim to utilize them to extract research trends or insights, which would benefit the relevant research community.<br/> In this work, we collected papers related to CO<sub>2</sub>RR and conducted a study to extract key entities from the papers based on named entity recognition (NER). We provide a universal method to crawl and screen papers of user’s interest (in this example, CO<sub>2</sub> electrochemical reduction research) and excluding noise papers using a combination of Doc2Vec and the Latent Dirichlet Allocation (LDA) model: As a result, we collected approximately 4,800 papers. Then, we developed NER models based on long short term memory (LSTM) or bidirectional encoder representations from transformer (BERT). These models were applied to the abstracts of the collected papers so that ten key entities regarding material names (catalyst, electrolyte etc.) and catalytic performances (Faradaic efficiency, current density etc.) are extracted. The average f1-score of MatBERT-based approach is over 85%, greatly exceeding that of LSTM-based approach, indicating the context-inclusive approach is necessary. Additionally, we also investigated over various BERT models, from BERT_base, SciBERT, MatSciBERT, and MatBERT) and their performance comparisons tell that the more domain knowledge is reflected in BERT model, the better the performance we achieve. Lastly, the trend and knowledge extracted from the NER studies in the CO<sub>2</sub>RR research field will be discussed.

Symposium Organizers

Arun Kumar Mannodi Kanakkithodi, Purdue University
Sijia Dong, Northeastern University
Noah Paulson, Argonne National Laboratory
Logan Ward, University of Chicago

Symposium Support

Silver
Energy Material Advances, a Science Partner Journal

Bronze
Chemical Science | Royal Society of Chemistry
Patterns, Cell Press

Session Chairs

Arun Kumar Mannodi Kanakkithodi
Noah Paulson

In this Session

DS03.07.01
DCGANs-Based SOFC Synthetic Image Generation Method

DS03.07.02
Inverse Design of BaTiO3's Synthetic Condition via Machine Learning

DS03.07.03
Development of an Open-Source Adsorption Model for Direct Air Capture

DS03.07.04
High-Throughput Discovery of High-Entropy Alloys Nanocatalysts via Active Learning Approach

DS03.07.05
Trend Analysis and Insight Extractions Using Named Entity Recognition of CO2RR Literature

DS03.07.06
DenseSSD—A Computer Vision Model for Vial-Positioning Detection to Improve Safety in Autonomous Laboratory

DS03.07.07
Autonomous Laboratory for Bespoke Synthesis of Nanoparticles Using Parallelized Bayesian Optimization

DS03.07.08
Machine Learning Based Investigation of Optimal Synthesis Parameters for Epitaxially Grown III–Nitride Semiconductors

DS03.07.09
Towards an Autonomous Combinatorial Co-Sputtering Reactor

DS03.07.10
A Robust Neural Network for Extracting Dynamics from Time-Resolved Electrostatic Force Microscopy Data

View More »

Publishing Alliance

MRS publishes with Springer Nature