Christopher Kuenneth1,2,Rampi Ramprasad2
University of Bayreuth1,Georgia Institute of Technology2
Christopher Kuenneth1,2,Rampi Ramprasad2
University of Bayreuth1,Georgia Institute of Technology2
Polymers play a crucial role in our daily lives, offering a wide range of applications. The vast polymer cosmos possesses both exciting opportunities and significant challenges when it comes to identifying suitable candidates for specific applications. Here, we show an end-to-end polymer informatics pipeline that searches this vast space for suitable candidates at unprecedented speed and accuracy. The pipeline includes a large language model-based fingerprinting capability called polyBERT. polyBERT acts as a “chemical linguist”, treating the chemical structure of polymers as chemical language. A multitask learning approach maps the polyBERT fingerprints to a variety of polymer properties. In comparison to manually designed fingerprinting schemes, our polyBERT pipeline achieves a remarkable speed improvement of two orders of magnitude while maintaining accuracy, making it a highly promising candidate for deployment in scalable architectures, including cloud infrastructures.