Apr 9, 2025
4:15pm - 4:30pm
Summit, Level 4, Room 424
Jie Chen1,Timothy Long2,Michael Wall2,Todd Hufnagel2,Wei Chen3
Virginia Tech1,Johns Hopkins University2,Northwestern University3
Jie Chen1,Timothy Long2,Michael Wall2,Todd Hufnagel2,Wei Chen3
Virginia Tech1,Johns Hopkins University2,Northwestern University3
Machine learning (ML) techniques can accelerate the throughput of materials characterization, for example by automating phase identification from x-ray diffraction patterns. An effective ML model needs to learn from significant amount of labeled data, but obtaining and labeling a sufficient quantity of experimental data can be time-consuming and expensive. Alternatively, an ML model can be trained on simulation data, but its subsequent performance in analyzing real (experimental) data may be limited because the simulation data cannot capture the full complexity of real experiments.
This work focuses on adapting ML models trained on simulation x-ray diffraction to analysis of real data, including uncertainty quantification, and using the uncertainty estimate to identify “interesting” and potentially novel structures. First, we developed a convolutional neural network (CNN) to build a predictive model. By integrating a CNN with spectral-normalized neural Gaussian process (SNGP), the model we develop has awareness of uncertainties, a feature that is typically lacking in conventional neural networks. We show that the model can, given a number of diffraction patterns, identify unusual (and possibly novel) structures through inspection of the prediction uncertainties. This is useful in the context of high-throughput materials characterization because it allows a researcher to focus detailed investigation on specimens (or regions of specimens) that are of greatest interest.
A second aspect of our work addresses the issue of poor performance when applying a machine learning model trained on simulation data to analysis of real experimental data. We developed a simulation-to-experiment domain adaptation method to mitigate the effect of domain shift between simulated and experimental data in the latent space. The adapted model can be further enhanced by tuning with a small amount of labeled experimental data. The ability to train a model primarily on simulation data, with only limited experimental data, is critical for materials exploration campaigns where the full range of structures to be encountered is not known ahead of time.