Accelerating Advanced Data Visualization with RAG-Based In-Context Learning—A Novel Assistant for Scientific Workflows

When and Where

Dec 4, 2024
8:00pm - 10:00pm

Hynes, Level 1, Hall A

Presenter(s)

Holt Bui

Brandi Ransom

Stefan Zecevic

Tim Erdmann

Co-Author(s)

Tim Erdmann¹,Holt Bui¹,Brandi Ransom¹,Stefan Zecevic¹

IBM Research¹

Abstract

Tim Erdmann¹,Holt Bui¹,Brandi Ransom¹,Stefan Zecevic¹

IBM Research¹

In the era of big data, the ability to quickly interpret and visualize complex datasets is paramount for advancing scientific discovery, particularly in materials science. While widely used, traditional tools like Excel and Origin often struggle to quickly and efficiently create sophisticated visualizations on-demand from new datasets. To address this limitation, we have developed a visualization assistant that leverages large language models (LLMs) and the Vega-Lite grammar to produce a diverse array of data visualizations on-demand within seconds. This assistant not only accelerates the visualization process but also enables the creation of complex and interactive visualizations that are challenging to construct with conventional tools – or by Matplotlib as frequently used in data science. Initially, we explored fine-tuning LLMs to specialize them for our visualization tasks. However, this approach proved to be difficult and ineffective due to several drawbacks: high computational costs, lengthy training times, required skill levels, and the extreme overhead in adapting to new visualization types over time.
In our talk, we will present how we overcame these challenges by employing Retrieval-Augmented Generation (RAG)-based in-context learning. We will delve into dataset creation, the architecture and workflow of our visualization assistant, and its current capabilities—including creating various chart types, incorporating aggregations, and adding interactive elements. Thereby, all visualizations can be crafted from simple natural language queries, and since the actual data is never sent directly to the LLMs, confidentiality is ensured. Furthermore, we will present recent advancements in transitioning to agentic workflows.
This methodology streamlines the visualization process and addresses data security concerns, making it highly suitable for sensitive research environments. Additionally, we believe that our approach democratizes access to advanced on-demand visualizations and serves as a template for developing RAG-based in-context learning systems for applications in material science, aiming to inspire interdisciplinary collaboration and drive innovation in AI-catalyzed scientific workflows.

Symposium Organizers

Kjell Jorner, ETH Zurich

Jian Lin, University of Missouri-Columbia

Daniel Tabor, Texas A&M University

Dmitry Zubarev, IBM

Session Chairs

Kjell Jorner

Jian Lin

Dmitry Zubarev

Symposium Supporters

2024 MRS Fall Meeting & Exhibit