Pedro Arrechea1,Dmitry Zubarev1,James Hedrick1,Nathaniel Park1,Tim Erdmann1
IBM1
Pedro Arrechea1,Dmitry Zubarev1,James Hedrick1,Nathaniel Park1,Tim Erdmann1
IBM1
Machine learning and artificial intelligence is currently exhibiting a renaissance in the field of computer science. This is, in part, due to the aggregation of large datasets facilitated by the society’s use of the internet for commonplace activities. For the chemical sciences, physically based computer simulations are employed to generate large datasets that are then iteratively utilized by machine learning algorithms. Herein we disclose the development of a platform for experimental system that can be used to generate large datasets from small scale laboratory “chemical plants.” The polymerization of L,L-lactide by a urea based catalyst system was selected as a model system on the basis of anticipated reproducibility, reasonably modest number of process conditions inputs and complex property outputs. Over four hundred different process conditions were explored and afforded polymers in a single five-hour campaign. These examples were characterized by <sup>1</sup>H-NMR, GPC and MALDI. These process conditions spanned parameters where good control was observed as characterized by low polydispersity, well defined molecular weight distribution and good agreement between “target” degree of polymerization (DP) and measured DP by <sup>1</sup>H-NMR end group analysis. Several process conditions also afforded “poorly” controlled but reproducible polymers characterized by a combination of high polydispersity, multimodal molecular weight distributions, and/or poor end-group fidelity. The platform meets desired attributes for on-demand and at-scale data acquisition for data-centric materials discovery as illustrated by the application of Bayesian optimization for experiment prioritization. The developed software platform can implement more complex polymerizations incorporating different catalyst compositions, comonomers, concentrations, and reactor configurations. These processes can be used to generate large datasets for machine learning and artificial intelligence applications.