Dec 4, 2024
11:30am - 11:45am
Sheraton, Second Floor, Constitution B
Seiji Takeda1,Indra Priyadarsini S1,Lisa Hamada1,Hajime Shinohara1,Onur Boyar1,Emilio Vital Brazil2,Eduardo Almeida Soares2,Flaviu Cipcigan3,David Braines3
IBM Research-Tokyo1,IBM Research - Brazil2,IBM Research - UK3
Seiji Takeda1,Indra Priyadarsini S1,Lisa Hamada1,Hajime Shinohara1,Onur Boyar1,Emilio Vital Brazil2,Eduardo Almeida Soares2,Flaviu Cipcigan3,David Braines3
IBM Research-Tokyo1,IBM Research - Brazil2,IBM Research - UK3
<b>Short Summary:</b><br/>In this talk, we present the latest status of our multi-modal foundation model (FM) for material discovery, along with our open innovation efforts in model development and community building. Our FM integrates over five modalities, including SMILES and SELFIES, providing two key functions: (1) robust feature representations for high-accuracy downstream prediction tasks, and (2) cross-modal inferences. Additionally, we are fostering an open community in the framework of the AI Alliance, bringing together industries and academia to collaboratively advance model development.<br/><br/><b>Introduction:</b><br/>Artificial intelligence (AI) has been playing a critical role in materials discovery, however current applications are limited and fragmented. Existing AI models are uni-modal, focusing on specific tasks such as property prediction, molecule generation, etc. These models are often constrained by small parameter sizes (typically under 100 million) and limited datasets. Furthermore, they primarily rely on single-modal data, resulting in suboptimal performance. Redundancies in development efforts further hinder progress, as many models operate in isolation without leveraging potential synergies.<br/>To address these challenges, we’re developing a multi-modal foundation model. This model significantly enhances AI capabilities, supporing over a billion parameters and utilizing data from different modalities. By merging these data sources, our model generates richer feature representations, resulting in enhanced accuracy, higher fidelity in material generation, and integrated knowledge across various domains.<br/><br/><b>Model and Experiments:</b><br/>Rather than constructing a large monolithic model, we adopted a flexible and extensible architecture by late-fusing modality-specific models, each of which is independently pre-trained. Each uni-modal model, having a transformer architecture, was pre-trained in a self-supervised manner on distinct modality data, such as SMILES, SELFIES, molecular graphs, and 3D atomic structures, extracted from public data sets including PubChem and ZINC. The latent spaces from these independently pre-trained models were subsequently fused using several approaches, including naive concatenation, Mixture-of-Experts, attention-based fusion etc., to create a multi-modal foundation model.<br/>We evaluated the performance of these models using well-established benchmarks such as MoleculeNet and QM9, as well as domain-specific datasets including chromophore molecules. Our experiments demonstrate that the fused multi-modal model consistently outperforms existing models in classification and prediction accuracy across these benchmarks.<br/><br/><b>Community Building:</b><br/>In parallel with our technical developments, we are building an open innovation community aimed at fostering collaboration between industry and academia through the AI Alliance, an open consortium. This community brings together AI and chemistry experts to advance foundation model development in an open, collaborative environment. Parts of the foundation model have been released as open-source, and to date, over ten companies and academic institutions have adopted these models. We will expand this community globally, creating the first large-scale open consortium for AI-driven materials science.