Natural Language Processing for Data Extraction and Synthesizability Prediction from the Energy Materials Literature

When and Where

Nov 28, 2022
4:30pm - 5:00pm

Hynes, Level 2, Room 206

Presenter

Anubhav Jain

Co-Author(s)

Anubhav Jain¹

Lawrence Berkeley National Laboratory¹

Abstract

Anubhav Jain¹

Lawrence Berkeley National Laboratory¹

Historically, both data and knowledge (connections and conclusions based on data) in the materials domain has been recorded mainly as text, figures, or tables in journal articles. Such data is critical to both conventional and machine learning-driven materials discovery. In this talk, I will describe some of our efforts to extract information from the research literature automatically based on natural language processing techniques. For example, data on the dopability of materials is difficult to simulate, but is present either implicitly or explicitly as part of many research studies. Similarly, data on materials synthesis can be difficult or impossible to simulate but can be extracted from the historical research literature. The talk will summarize our most recent progress towards extracting both individual data items as well as "knowledge" (e.g., proposed applications of a chemical composition) in various areas, including extracting materials property data and data pertaining to materials synthesis. Overall, such work may ultimately lead to accelerated energy materials design through access to previously hidden data sets.