Apr 11, 2025
10:15am - 10:45am
Summit, Level 4, Room 422
Santiago Miret1
Intel Corporation1
Machine Learning (ML) methods that can process large amounts of heterogenous data have tremendous potential to accelerate the end-to-end discovery, synthesis, and characterization of novel materials to address global-scale challenges like clean energy, sustainable semiconductor manufacturing and drug discovery. In this talk, I will present an overview of Intel Labs’ research efforts and community engagement efforts on ML for materials discovery along with technical deep dives focusing on two ambitious goals:
1.
Machine Learning Interatomic Potentials (MLIPs): Accelerating scientific simulation by >100x using geometric deep learning and software tools to enable large-scale deployment of machine learning potentials for real-world simulations. Through the Open MatSci ML Toolkit [1], Intel Labs makes the training and deployment of MLIPs accessible by connecting relevant data sources with modern ML models and scalable deep learning training capabilities. In addition to the Open MatSci ML Toolkit, we have enabled the acceleration of equivariant deep learning methods through EquiTriton [2], an open-source implementation of Triton-based spherical harmonic kernels. The acceleration of equivariant models by EquiTriton has enabled us to train models at higher levels of expressivity of spherical harmonics kernels and study their relative importances when modeling materials properties.
2.
Materials Science Language Models: Leveraging Large Language Models (LLMs) as scientific assistants to automate scientific tasks for materials discovery. While modern LLMs have made great progress in solving language-based tasks for a variety of fields, they still exhibit lack of understanding of the materials science domain. We have proposed some new methods to alleviate this gap, including new benchmarks (MatSciNLP [3] ), multi-round instruction fine-tuning for the first billion-scale LLM for materials science (HoneyBee [4]), as well as a tool-augmented LLM that markedly improves the capabilities of diverse language models to perform materials science language tasks (HoneyComb [5]). Concurrently, we continue to showcase gaps and limitations of language models, such as property prediction dependent on geometry (MatText [6]), that require further research to enable important capabilities.
[1] Santiago Miret, Kin Long Kelvin Lee, Carmelo Gonzales, Marcel Nassar, and Matthew Spellings. "The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science."
Transactions on Machine Learning Research, 2023. https://openreview.net/forum?id=QBMyDZsPMd
[2] Lee, Kin Long Kelvin, Mikhail Galkin, and Santiago Miret. "Scaling Computational Performance of Spherical Harmonics Kernels with Triton."
AI for Accelerated Materials Design-Vienna 2024. 2024.
[3] Yu Song, Santiago Miret, and Bang Liu. 2023. MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling. In
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3621–3639, Toronto, Canada. Association for Computational Linguistics.
[4] Yu Song, Santiago Miret, Huan Zhang, and Bang Liu. 2023. HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science. In
Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5724–5739, Singapore. Association for Computational Linguistics.
[5] Zhang, Huan, et al. "HoneyComb: A Flexible LLM-Based Agent System for Materials Science.”
AI for Accelerated Materials Design-NeurIPS 2024. 2024. [6] Alampara, Nawaf, Santiago Miret, and Kevin Maik Jablonka. "MatText: Do Language Models Need More than Text & Scale for Materials Modeling?."
AI for Accelerated Materials Design-Vienna 2024.