MRS Meetings and Events

 

DS01.15.04 2022 MRS Spring Meeting

Regression Transformer—Blending Numerical and Textual Tokens for Concurrent Property Prediction and Conditional Generation

When and Where

May 23, 2022
11:30am - 11:45am

DS01-Virtual

Presenter

Co-Author(s)

Jannis Born1,2,Matteo Manica1

IBM Research Europe1,ETH Zürich2

Abstract

Jannis Born1,2,Matteo Manica1

IBM Research Europe1,ETH Zürich2
Transformer-based models lack an intrinsic way of representing numerals as tokens. Hence, the benefits of large-scale self-supervised pretraining do not yet extend to text datasets with quantitative numerical labels .However, efficiently encoding continuous properties jointly with sentences would open the door for ”swiss army knife” autoregressive Transformers that concurrently perform property prediction and conditional generation, dependent on the mask location.To that end, we present the Regression Transformer (RT), a XLNet-based language model that can be trained on numerically labeled text datasets. We introduce a scheme to convert floats of arbitrary precision into a sequence of tokens and then devise numerical encodings that preserve distances of digits in the embedding space. Focusing on chemical languages, we propose an alternating training scheme to concurrently optimize property prediction (PP) and text generation and extend the XLNet objective with a self-consistency loss. Our results on several synthetic and realistic molecular PP datasets demonstrate that the generality of self-supervised pretraining extends to numerically labelled datasets. In particular, the performance of traditional regression models can be surpassed by encoding numerals as tokens and training with cross entropy loss. Importantly, priming the same model with continuous properties encoded as tokens naturally yields a conditional generative models that is found useful forproperty-driven, local exploration of the chemical space.

Symposium Organizers

Mathieu Bauchy, University of California, Los Angeles
Mathew Cherukara, Argonne National Laboratory
Grace Gu, University of California, Berkeley
Badri Narayanan, University of Louisville

Publishing Alliance

MRS publishes with Springer Nature