Batch Bayesian Optimization for High-Dimensional Experimental Design— Simulation and Visualization

When and Where

Dec 5, 2024
4:00pm - 4:15pm

Hynes, Level 2, Room 209

Presenter(s)

Imon Mia

Armi Tiihonen

Mark Lee

Roman Garnett

Tonio Buonassisi

William Vandenberghe

Julia Hsu

Co-Author(s)

Julia Hsu¹,Imon Mia¹,Armi Tiihonen²,Mark Lee¹,Roman Garnett³,Tonio Buonassisi⁴,William Vandenberghe¹

The University of Texas at Dallas¹,Aalto University²,Washington University in St. Louis³,Massachusetts Institute of Technology⁴

Abstract

Julia Hsu¹,Imon Mia¹,Armi Tiihonen²,Mark Lee¹,Roman Garnett³,Tonio Buonassisi⁴,William Vandenberghe¹

The University of Texas at Dallas¹,Aalto University²,Washington University in St. Louis³,Massachusetts Institute of Technology⁴

Optimization is a common task in materials science. Bayesian optimization (BO) is increasingly used in experimental work involving varying levels of automation. Before implementing BO in an experimental campaign, many researchers prefer to implement BO in a simulation environment using synthetic data, which provides pedagogical and troubleshooting value. Two large differences between experimental and simulation work are (1) experiments are often performed in batches, i.e., processing multiple samples at once, to save materials cost or time, and (2) experimental data contain aleatoric uncertainties manifesting as noise.

In this work, we develop a framework to visualize BO step by step, first as an evaluation tool for simulation environments (and later possibly a debugging tool for experiments). We showcase an example of simulated data with increasing noise, evaluating optimization strategies as a function of noise magnitude. In our demonstration, we implement batch BO using the Emukit package to find the optimum to 6-dimensional Ackley and Hartmann functions. 6 dimensions in the predictor inputs (X) are chosen to mimic the number of input variables commonly used in experimental work. The Ackley function represents a needle-in-a-haystack experimental manifold, i.e., hard-to-find global maximum in the objective (y), while the Hartmann function represents a more gradual landscape but contains a second local maximum similar in objective value to the global maximum but at a significantly different X value. Using synthetic data without noise, the optimization, i.e., learning, progress is first studied for how it is affected by the choices of acquisition function (expected improvement vs upper confidence bound), hyperparameters, and batch-picking method. Latin hypercube sampling (LHS) is used to pick initial X values for collecting data, followed by 50 learning cycles with a batch size of 4 at each round. 99 LHSs are implemented to understand statistical variations. The optimization results are evaluated based on instant regret in X, which is the Euclidian distance between the final optimal X_opt from the model and the X_max of the ground truth y maximum, averaged over the 99 LHS. While most papers in the literature track the difference between the model and ground truth y values, we argue that X is more important to experimenters because the inputs are what can be controlled and varied, and y values from the model deviate from the ground truth because of the details of the Gaussian process regression. The effects of noise on the optimization are evaluated for normally distributed noise levels ranging from 1 to 20 %. We show that adding noise based on the percentage of the ground truth y maximum, as is commonly done in the literature, overestimates the noise when compared to the signal-to-noise ratio in experiments. We also develop several visualization methods to show the optimization progression and outcomes as visualization is important for high-dimensional problems because it is difficult for humans to comprehend results for problems higher than three dimensions.

This work is supported by NSF CMMI-2109554. JWPH and TB acknowledge the support of Simons Foundation Pivot Fellowship.

Keywords

autonomous research

Symposium Organizers

Andi Barbour, Brookhaven National Laboratory

Lewys Jones, Trinity College Dublin

Yongtao Liu, Oak Ridge National Laboratory

Helge Stein, Karlsruhe Institute of Technology

Session Chairs

Lewys Jones

Yongtao Liu

Symposium Supporters

2024 MRS Fall Meeting & Exhibit