AI Physicist—Data-Driven Discovery of Mathematical Expressions via Natural Language Processing

When and Where

May 13, 2022
2:00pm - 2:15pm

Hawai'i Convention Center, Level 3, Lili'U Theater, 310

Presenter

Juwon Na

Seungchul Lee

Co-Author(s)

Juwon Na¹,Seungchul Lee¹

Pohang University of Science and Technology¹

Abstract

Juwon Na¹,Seungchul Lee¹

Pohang University of Science and Technology¹

Natural phenomena can be described by concise mathematical expressions. A central challenge in natural sciences and engineering, therefore, lies in symbolic regression: discovering a simple but accurate symbolic expression that fits a given dataset. However, the combinatorial nature of symbolic regression makes the task challenging. In this work, we present a mathematical language model, which leverages the representational capacity of natural language processing (NLP) models for symbolic regression. Specifically, our framework involves three main stages: (1) mathematical expression as language, (2) mathematical language modeling, and (3) bridge mathematical language modeling with reinforcement learning. With extensive experiments on several symbolic regression benchmarks, we demonstrate that our framework improves the ability to recover mathematical expressions from data in terms of (1) accuracy, (2) noise tolerance, and (3) inclusion of dummy input variables. Our contribution includes the framework that recasts the problem of symbolic regression as natural language understanding tasks, allowing symbolic regression researchers to leverage recent breakthroughs in language modeling.