Dec 4, 2024
8:15am - 8:45am
Sheraton, Second Floor, Constitution B
Maria Chan1
Argonne National Laboratory1
The explosive growth of AI/ML in materials science has largely been fueled by computational data which are abundant, diverse, and consistent. In contrast, AI training based on experimental data has been extremely challenging due to numerous fundamental challenges in obtaining, preparing, or sharing AI-ready data. In this talk, we will discuss how we may resolve such difficulties. Strategies include creating experimentally-realistic computational data, extracting labeled microscopy [1] and digitized spectroscopy [2] data from scientific literature (now with LLM!), and establishing metadata standards in experimental microscopy and spectroscopy data, and corresponding data infrastructure. We will also discuss intricacies involved in linking computational and experimental data. The importance of both types of data in AI/ML workflows will also be discussed [3].<br/><br/>[1] E. Schwenker, W. Jiang, T. Spreadbury, N. Ferrier, O. Cossairt, M. K. Y. Chan, “EXSCLAIM! -- Harnessing materials science literature for labeled microscopy datasets,” Patterns 4, 100843 (2023). DOI:10.1016/j.patter.2023.100843.<br/>[2] W. Jiang, K. Li, T. Spreadbury, E. Schwenker, O. Cossiart, M. K. Y. Chan, “Plot2Spectra: an Automatic Spectra Extraction Tool,” Digital Discovery 1, 719-731 (2022). DOI: 10.1039/D1DD00036E.<br/>[4] Y. Chen, C. Chen, I. Hwang, M. J. Davis, W. Yang, C.J. Sun, G. Lee, D. McReynolds, D. Allan, J. M. Arias, S. P. Ong, and M. K. Y. Chan, “Robust Machine Learning Inference from X-ray Absorption Near Edge Spectra through Featurization,” Chemistry of Materials, 36, 5, 2304–2313 (2024). DOI:10.1021/acs.chemmater.3c02584.