Anubhav Jain1
Lawrence Berkeley National Laboratory1
Anubhav Jain1
Lawrence Berkeley National Laboratory1
Historically, both data and knowledge (connections and conclusions based on data) in the materials domain has been recorded mainly as text, figures, or tables in journal articles. Such data is critical to both conventional and machine learning-driven materials discovery. In this talk, I will describe some of our efforts to extract information from the research literature automatically based on natural language processing techniques. For example, data on the dopability of materials is difficult to simulate, but is present either implicitly or explicitly as part of many research studies. Similarly, data on materials synthesis can be difficult or impossible to simulate but can be extracted from the historical research literature. The talk will summarize our most recent progress towards extracting both individual data items as well as "knowledge" (e.g., proposed applications of a chemical composition) in various areas, including extracting materials property data and data pertaining to materials synthesis. Overall, such work may ultimately lead to accelerated energy materials design through access to previously hidden data sets.