MRS Meetings and Events

 

DS02.13.02 2022 MRS Fall Meeting

Disconnection Aware Steering of Retrosynthesis Transformer to Facilitate Materials Design

When and Where

Dec 6, 2022
8:15am - 8:30am

DS02-virtual

Presenter

Co-Author(s)

Amol Thakkar1,Andrea Byekwaso1,Alain Vaucher1,Philippe Schwaller1,Alessandra Toniato1,Teodoro Laino1

IBM Research-Zurich1

Abstract

Amol Thakkar1,Andrea Byekwaso1,Alain Vaucher1,Philippe Schwaller1,Alessandra Toniato1,Teodoro Laino1

IBM Research-Zurich1
Retrosynthetic analysis is the task of breaking down a target molecule into its constituent precursors until a set of commercially available building blocks is reached. At each single step in the sequence, the bonds to be changed and/or functional group interconversions are identified, and the molecule broken into hypothetical precursors. Several deep-learning-based approaches to single-step retrosynthesis treat the prediction of possible disconnections as a translation task, relying on the use of the Transformer architecture [1] and the simplified molecular-input line-entry system (SMILES) [2,3] notation [4-7]. Given a target molecule, these approaches suggest the best set of precursors (i.e. reactants, and possibly other reagents) as the translation's outcome, with the possibility to generate multiple such sets.<br/> <br/>However, in their current form, retrosynthetic prediction systems offer the chemist little control over the site at which disconnections are made. As such, this work serves to enable user-defined disconnections for single-step retrosynthetic analysis, enabling steering of transformer models for retrosynthetic prediction. Whereas previous models offer no opportunity to steer the model and remain limited in the disconnections they propose. Thus, paving the ground for a ‘human-in-the-loop’ component harnessing both expert knowledge and deep learning. To this end, we have investigated methods to enhance user interaction from tagging input molecules, through to dataset augmentation. We additionally introduce and examine the predictions using several metrics beyond topN accuracy, which serves to build an understanding of how the predictions made align with those expected by chemists. Thus, we take a step towards improving decision-making strategies that statistical and machine learning algorithms cannot yet encode due to a lack of relevant training data. Ultimately this serves to enhance a chemist’s experience by facilitating user engagement.<br/> <br/>[1] Vaswani, A. et al.; Advances in neural information processing systems 2017, 5998–6008. [2] Weininger, D.; J. Chem. Inf. Comput. Sci. 1988, 28, 31–36.<br/>[3] Weininger, D.; Weininger, A.; Weininger, J. L.; J. Chem. Inf. Comput. Sci. 1989, 29, 97–101. [4] Yang, Q. et al.; Chem. Commun. 2019, 55, 12152–12155.<br/>[5] Karpov, P.; Godin, G.; and Tetko, I. V.; International Conference on Artificial Neural Networks 2019, 817–830.<br/>[6] Duan, H.; Wang, L.; Zhang, C.; Guo, L.; and Li, J.; RSC Adv. 2020, 10, 1371–1378.<br/>[7] Schwaller, P. et al.; Chem. Sci. 2020, 11, 3316–3325.

Keywords

chemical reaction | chemical synthesis

Symposium Organizers

N M Anoop Krishnan, Indian Institute of Technology Delhi
Mathieu Bauchy, University of California, Los Angeles
Ekin Dogus Cubuk, Google
Grace Gu, University of California, Berkeley

Symposium Support

Bronze
Patterns, Cell Press

Publishing Alliance

MRS publishes with Springer Nature