MRS Meetings and Events

 

MD02.09.03 2023 MRS Spring Meeting

Unifying Molecular and Textual Representations via Multi-task Language Modelling

When and Where

Apr 25, 2023
8:35am - 8:50am

MD02-virtual

Presenter

Co-Author(s)

Matteo Manica1,Dimitrios Christofidellis1,Giorgio Giannone1,Jannis Born1,Teodoro Laino1

IBM Research Europe1

Abstract

Matteo Manica1,Dimitrios Christofidellis1,Giorgio Giannone1,Jannis Born1,Teodoro Laino1

IBM Research Europe1
Neural language models have achieved impressive results in various natural language understanding and generation tasks.<br/>Recently, advances in language models have been successfully transferred to the chemical domain, proposing generative modeling solutions to classical problems from molecular design to synthesis planning.<br/>These new methods have shown potential for optimizing chemical laboratory operations, initiating a new era of data-driven automation in scientific discovery.<br/>However, despite these recent successes, specialized models for each chemical task are typically needed, requiring problem-specific fine-tuning and neglecting tasks' dependencies.<br/>However, the lack of a unified representation between the information expressed in natural language and chemical representations is the main limiting factor in the interaction between humans and the models. Inspired by recent advances in generative transfer learning, we explore a multi-task language model that can tackle a large variety of tasks in the chemical and natural language domains.<br/>We rely on mono-domain, frozen encoder models and jointly fine-tune a decoder on multiple domains.<br/>In doing so, we relieve the cross-domain training from computationally expensive, data-hungry pretraining, leveraging the power of language models trained on unstructured data.<br/>Furthermore, we apply multi-task learning to increase model expressivity and information sharing between modalities.<br/>In this way, our model handles chemical and natural language concurrently and can solve numerous chemical and natural language-based tasks using a single set of weights.<br/>We quantitatively evaluate our method against state-of-the-art baselines, exploring different strategies to adapt and fine-tune cross-domain language models.<br/>Our work paves the way for robust and efficient language models accelerating discovery in physical sciences.

Keywords

chemical reaction

Symposium Organizers

Soumendu Bagchi, Los Alamos National Laboratory
Huck Beng Chew, The University of Illinois at Urbana-Champaign
Haoran Wang, Utah State University
Jiaxin Zhang, Oak Ridge National Laboratory

Symposium Support

Bronze
Patterns and Matter, Cell Press

Publishing Alliance

MRS publishes with Springer Nature