Peter Mastracco1,Alexander Gorovitz2,Anna Gonzalez Rosell1,Joshua Evans3,Petko Bogdanov2,Stacy Copp1
University of California, Irvine1,State University of New York at Albany2,Chaffey College3
Peter Mastracco1,Alexander Gorovitz2,Anna Gonzalez Rosell1,Joshua Evans3,Petko Bogdanov2,Stacy Copp1
University of California, Irvine1,State University of New York at Albany2,Chaffey College3
DNA-stabilized silver clusters (Ag<sub>N</sub>-DNAs) are promising nanomaterials for applications in sensing, nanophononics, and bioimaging, due to their diverse sequence-encoded fluorescence color, high quantum yields, and unique rod-like cluster structures. However, our poor understanding of how DNA templates Ag<sub>N</sub> growth, combined with the huge sequence space of possible DNA oligomers, challenges the fundamental understanding and rational design of Ag<sub>N</sub>-DNAs. To address this problem, we are employing machine learning to discern how DNA sequence selects the fluorescence colors of Ag<sub>N</sub>-DNAs. Building from our previous work using machine learning for Ag<sub>N</sub>-DNA design, and enabled by well-controlled high-throughput experimental methods for Ag<sub>N</sub>-DNA synthesis and characterization, we introduce a new physically informed machine learning model inspired by the recently resolved crystal structures of a few Ag<sub>N</sub>-DNAs. We begin with feature vectors that quantify the prevalence of DNA base patterns, specifically, relative arrangement of pairs of nucleic acids within DNA template sequences. We then train an ensemble of support vector machines to assign the most probable fluorescence color class to input DNA sequence. Using this model, we screen all 4<sup>10</sup> possible 10-base DNA sequences to select the most promising DNA templates for Ag<sub>N</sub>-DNAs of select colors. In addition, feature selection tools allow us to gain insights into the most important DNA ligand motifs that are discriminative for the atomic sizes and fluorescence colors of Ag<sub>N</sub>-DNAs. This work provides a case study for combining machine learning with known physics and experimental training data to improve both the design process and fundamental understanding of nanomaterials systems.