On-Demand Generation of Large Polymer Datasets for Accelerated Materials Discovery

When and Where

May 23, 2022
2:30pm - 2:45pm

DS01-Virtual

Presenter

Pedro Arrechea

Dmitry Zubarev

James Hedrick

Nathaniel Park

Tim Erdmann

Co-Author(s)

Pedro Arrechea¹,Dmitry Zubarev¹,James Hedrick¹,Nathaniel Park¹,Tim Erdmann¹

IBM¹

Abstract

Pedro Arrechea¹,Dmitry Zubarev¹,James Hedrick¹,Nathaniel Park¹,Tim Erdmann¹

IBM¹

Machine learning and artificial intelligence is currently exhibiting a renaissance in the field of computer science. This is, in part, due to the aggregation of large datasets facilitated by the society’s use of the internet for commonplace activities. For the chemical sciences, physically based computer simulations are employed to generate large datasets that are then iteratively utilized by machine learning algorithms. Herein we disclose the development of a platform for experimental system that can be used to generate large datasets from small scale laboratory “chemical plants.” The polymerization of L,L-lactide by a urea based catalyst system was selected as a model system on the basis of anticipated reproducibility, reasonably modest number of process conditions inputs and complex property outputs. Over four hundred different process conditions were explored and afforded polymers in a single five-hour campaign. These examples were characterized by <sup>1</sup>H-NMR, GPC and MALDI. These process conditions spanned parameters where good control was observed as characterized by low polydispersity, well defined molecular weight distribution and good agreement between “target” degree of polymerization (DP) and measured DP by <sup>1</sup>H-NMR end group analysis. Several process conditions also afforded “poorly” controlled but reproducible polymers characterized by a combination of high polydispersity, multimodal molecular weight distributions, and/or poor end-group fidelity. The platform meets desired attributes for on-demand and at-scale data acquisition for data-centric materials discovery as illustrated by the application of Bayesian optimization for experiment prioritization. The developed software platform can implement more complex polymerizations incorporating different catalyst compositions, comonomers, concentrations, and reactor configurations. These processes can be used to generate large datasets for machine learning and artificial intelligence applications.