Juwon Na1,Seungchul Lee1
Pohang University of Science and Technology1
Juwon Na1,Seungchul Lee1
Pohang University of Science and Technology1
Natural phenomena can be described by concise mathematical expressions. A central challenge in natural sciences and engineering, therefore, lies in symbolic regression: discovering a simple but accurate symbolic expression that fits a given dataset. However, the combinatorial nature of symbolic regression makes the task challenging. In this work, we present a mathematical language model, which leverages the representational capacity of natural language processing (NLP) models for symbolic regression. Specifically, our framework involves three main stages: (1) mathematical expression as language, (2) mathematical language modeling, and (3) bridge mathematical language modeling with reinforcement learning. With extensive experiments on several symbolic regression benchmarks, we demonstrate that our framework improves the ability to recover mathematical expressions from data in terms of (1) accuracy, (2) noise tolerance, and (3) inclusion of dummy input variables. Our contribution includes the framework that recasts the problem of symbolic regression as natural language understanding tasks, allowing symbolic regression researchers to leverage recent breakthroughs in language modeling.