December 1 - 6, 2024
Boston, Massachusetts
Symposium Supporters
2024 MRS Fall Meeting & Exhibit
MT02.09.05

Rethinking Machine Learning for Small Data to Enable Automous Experiments

When and Where

Dec 4, 2024
9:15am - 9:30am
Hynes, Level 2, Room 209

Presenter(s)

Co-Author(s)

Qian Yang1,Nila Mandal1,Yushuo Niu1,Maryam Pardakhti1,Graham Roberts1,Ethan Chadwick1,Anson Ma1,Mu-Ping Nieh1

University of Connecticut1

Abstract

Qian Yang1,Nila Mandal1,Yushuo Niu1,Maryam Pardakhti1,Graham Roberts1,Ethan Chadwick1,Anson Ma1,Mu-Ping Nieh1

University of Connecticut1
While many powerful tools from machine learning (ML) have found their way to impactful applications in materials science, there remain barriers to applying data-driven methods broadly in experiments. One of these key barriers is posed by small datasets: while powerful modern ML methods rely on massive datasets and compute, the vast majority of experimental and computational datasets are comparatively small. In this talk, we will address the small data challenge by re-thinking commonly held paradigms for how machine learning algorithms are applied. Three examples will be discussed: (1) model selection for active machine learning, (2) data-efficient computer vision for in-situ defect detection, and (3) model selection for multi-class classification as demonstrated on inverse analysis for small angle scattering data.<br/><br/>In the first example, we will discuss the problem of model selection for active learning. Active learning (AL) is of great interest for automated scientific labs, where there is a strong need to minimize the number of costly experiments necessary to train predictive models. However, many AL methods assume fixed model hyperparameters that are chosen <i>a priori</i>. In practice, it is rarely true that good hyperparameters will be known in advance. To resolve this, we have developed a simple and fast method for practical active learning with model selection, based on weighted leave-one-out cross validation (LOOCV) on the biased actively sampled training dataset. We show empirically that our method can find hyperparameters that lead to better performance, and utilize it for process optimization in 3D printing among other applications.<br/><br/>In the second example, we will discuss data-efficient methods for deep learning-based defect detection in materials. One of the persistent challenges for a classification problem such as defect detection is the lack of sufficient labeled samples of the many different types of defects possible. We tackle this challenge by reformulating the problem as a change detection problem, leveraging ideas from one-shot and few-shot learning to significantly decrease the training dataset size required. We show that our lightweight model, which is designed to be easily amenable to transfer learning, achieves better performance than state-of-the-art models based on generative adversarial networks and transformer architectures in the absence of massive datasets.<br/><br/>Finally, in the third example, we tackle the common challenge of inverse analysis of experimental data, such as small angle scattering data, where the goal is to identify material structure from experimental scattering curves. For small angle scattering, identifying morphology from scattering curves is a multi-class classification problem, which can be time-consuming to solve by human experts with the assistance of fitting software. Accordingly, ML-based approaches have been proposed. We show that re-thinking model selection for multi-class classification can achieve highly accurate models using classical methods such as SVC, which is comparatively easy to train and requires much smaller datasets than deep learning approaches.

Keywords

in situ

Symposium Organizers

Andi Barbour, Brookhaven National Laboratory
Lewys Jones, Trinity College Dublin
Yongtao Liu, Oak Ridge National Laboratory
Helge Stein, Karlsruhe Institute of Technology

Session Chairs

Yongtao Liu
Zijie Wu

In this Session