Janine George1,2
Bundesanstalt für Materialforschung und –prüfung1,Friedrich-Schiller-Universität Jena2
The underlying data is crucial for machine learning (ML) tasks.<sup>[1]</sup> <i>Ab initio</i> data is often used as a target (occasionally as features<sup>[2]</sup>). High-throughput calculations and automation make it possible to generate such data as efficiently as possible and with uniform standards.<sup>[3,4]</sup> The high-throughput data of the Materials Project has recently been used to train new foundation interatomic potentials<sup>[5]</sup>.<br/>The presentation aims to show further possibilities and challenges that arise when combining high-throughput computation and machine learning. The focus will be on training ML interatomic potentials sufficient to predict harmonic phononic properties of materials. Typical foundation models in this area now achieve acceptable results but are still far from routinely replacing DFT.<sup>[5,6]</sup> Building on the promising results from the ref <sup>[7]</sup>, we will present new fully automated workflows for training and benchmarking ML interatomic potentials with force predictions that are accurate enough to compute harmonic phonons in very good agreement with DFT for different structures of the same composition. This is a first step towards foundation models accurate enough for phononic properties as well.<br/>In particular, the arrival of foundation models in the field of machine learning also makes it possible to think much further about the combination of automated <i>ab initio</i> computations and machine learning. Workflows now also need to be designed for fine-tuning tasks. The latter has the potential to significantly accelerate the combination of ML and high throughput for use in material design or material searches.<br/><br/><br/><b>References</b><br/>[1] C. Ben Mahmoud, J. L. A. Gardner, V. L. Deringer, <i>Nat. </i><i>Comput. Sci.</i> <b>2024</b>, DOI 10.1038/s43588-024-00636-1.<br/>[2] A. A. Naik, C. Ertural, N. Dhamrait, P. Benner, J. George, <i>Sci. Data</i> <b>2023</b>, <i>10</i>, 610.<br/>[3] A. S. Rosen, M. Gallant, J. George, J. Riebesell, H. Sahasrabuddhe, J.-X. Shen, M. Wen, M. L. Evans, G. Petretto, D. Waroquiers, G.-M. Rignanese, K. A. Persson, A. Jain, A. M. Ganose, <i>JOSS</i> <b>2024</b>, <i>9</i>, 5995.<br/>[4] A. M. Ganose, A. Bonkowski, X. Chen, Y. Chiang, O. A. Cohen, J. George, R. E. A. Goodall, R. D. Guha, A. D. Kaplan, R. S. Kingsbury, M. C. Kuner, X. Linn, M. J. McDermott, M. R. Srinivaas, A. N. Naik, G. Petretto, T. A. R. Purcell, M. Scheffler, A. Sobolev, F. Ricci, J. Riebesell, G.-M. Rignanese, A. S. Rosen, H. Sahasrabuddhe, J. Schmidt, J.-X. Shen, D. Waroquiers, D. Wang, M. Wen, Z. Zhu, A. Jain, <b>2024</b>, In preparation.<br/>[5] I. Batatia, P. Benner, Y. Chiang, A. M. Elena, D. P. Kovács, J. Riebesell, X. R. Advincula, M. Asta, W. J. Baldwin, N. Bernstein, A. Bhowmik, S. M. Blau, V. Cărare, J. P. Darby, S. De, F. Della Pia, V. L. Deringer, R. Elijošius, Z. El-Machachi, E. Fako, A. C. Ferrari, A. Genreith-Schriever, J. George, R. E. A. Goodall, C. P. Grey, S. Han, W. Handley, H. H. Heenen, K. Hermansson, C. Holm, J. Jaafar, S. Hofmann, K. S. Jakob, H. Jung, V. Kapil, A. D. Kaplan, N. Karimitari, N. Kroupa, J. Kullgren, M. C. Kuner, D. Kuryla, G. Liepuoniute, J. T. Margraf, I.-B. Magdău, A. Michaelides, J. H. Moore, A. A. Naik, S. P. Niblett, S. W. Norwood, N. O’Neill, C. Ortner, K. A. Persson, K. Reuter, A. S. Rosen, L. L. Schaaf, C. Schran, E. Sivonxay, T. K. Stenczel, V. Svahn, C. Sutton, C. van der Oord, E. Varga-Umbrich, T. Vegge, M. Vondrák, Y. Wang, W. C. Witt, F. Zills, G. Csányi, <b>2023</b>, DOI 10.48550/arXiv.2401.00096.<br/>[6] H. Yang, C. Hu, Y. Zhou, X. Liu, Y. Shi, J. Li, G. Li, Z. Chen, S. Chen, C. Zeni, M. Horton, R. Pinsler, A. Fowler, D. Zügner, T. Xie, J. Smith, L. Sun, Q. Wang, L. Kong, C. Liu, H. Hao, Z. Lu, <i>arXiv</i> <b>2024</b>, DOI http://arxiv.org/abs/2405.04967.<br/>[7] J. George, G. Hautier, A. P. Bartók, G. Csányi, V. L. Deringer, <i>J. Chem. </i><i>Phys.</i> <b>2020</b>, <i>153</i>, 044104.