Apr 10, 2025
2:15pm - 2:30pm
Summit, Level 4, Room 422
Justin Garrigus1,Thomas Bouchard2,Angela Zhang3,Fatimah Habis1,Yuanxi Wang1
University of North Texas1,Austin Peay State University2,The University of Texas at Austin3
Justin Garrigus1,Thomas Bouchard2,Angela Zhang3,Fatimah Habis1,Yuanxi Wang1
University of North Texas1,Austin Peay State University2,The University of Texas at Austin3
Point vacancies in crystals serve as functional defects in a wide array of applications ranging from quantum emitters to sensors, so it becomes necessary to efficiently screen through the vast combinatoric space of vacancies and their host materials to identify ideal candidates for a given target application. One fundamental defect property describing their thermodynamic stability is their defect formation energies, often requiring significant machine and human time to calculate at the density functional theory (DFT) level. Prior work using machine learning methods in predicting defect formation energies were often limited by the small (<1000) training dataset sizes. Here, we propose to leverage large (>10000) datasets of pristine crystal formation energies (which are cheaper to obtain) and to apply transfer learning to improve the accuracy of a graph convolutional neural network (GNN) predictor of defect formation energies. Despite the small training dataset size for defect formation energies, pretraining on pristine crystals formation energies significantly improves the accuracy of GNN predictions, with an accuracy approaching conventional DFT. We further relate the pretraining dataset size and the defect dataset size to the accuracy of the model to determine at which critical points it becomes necessary to obtain more data of either type. Using this, future machine learning models can utilize pretraining to train larger parameter counts due to the presence of more relevant data to the training task.