2024 MRS Fall Meeting & Exhibit
MT04.09.37

Accelerating Advanced Data Visualization with RAG-Based In-Context Learning—A Novel Assistant for Scientific Workflows

When and Where

Dec 4, 2024
8:00pm - 10:00pm
Hynes, Level 1, Hall A

Presenter(s)

Co-Author(s)

Tim Erdmann1,Holt Bui1,Brandi Ransom1,Stefan Zecevic1

IBM Research1

Abstract

Tim Erdmann1,Holt Bui1,Brandi Ransom1,Stefan Zecevic1

IBM Research1
In the era of big data, the ability to quickly interpret and visualize complex datasets is paramount for advancing scientific discovery, particularly in materials science. While widely used, traditional tools like Excel and Origin often struggle to quickly and efficiently create sophisticated visualizations on-demand from new datasets. To address this limitation, we have developed a visualization assistant that leverages large language models (LLMs) and the Vega-Lite grammar to produce a diverse array of data visualizations on-demand within seconds. This assistant not only accelerates the visualization process but also enables the creation of complex and interactive visualizations that are challenging to construct with conventional tools – or by Matplotlib as frequently used in data science. Initially, we explored fine-tuning LLMs to specialize them for our visualization tasks. However, this approach proved to be difficult and ineffective due to several drawbacks: high computational costs, lengthy training times, required skill levels, and the extreme overhead in adapting to new visualization types over time.<br/>In our talk, we will present how we overcame these challenges by employing Retrieval-Augmented Generation (RAG)-based in-context learning. We will delve into dataset creation, the architecture and workflow of our visualization assistant, and its current capabilities—including creating various chart types, incorporating aggregations, and adding interactive elements. Thereby, all visualizations can be crafted from simple natural language queries, and since the actual data is never sent directly to the LLMs, confidentiality is ensured. Furthermore, we will present recent advancements in transitioning to agentic workflows.<br/>This methodology streamlines the visualization process and addresses data security concerns, making it highly suitable for sensitive research environments. Additionally, we believe that our approach democratizes access to advanced on-demand visualizations and serves as a template for developing RAG-based in-context learning systems for applications in material science, aiming to inspire interdisciplinary collaboration and drive innovation in AI-catalyzed scientific workflows.

Symposium Organizers

Kjell Jorner, ETH Zurich
Jian Lin, University of Missouri-Columbia
Daniel Tabor, Texas A&M University
Dmitry Zubarev, IBM

Session Chairs

Kjell Jorner
Jian Lin
Dmitry Zubarev

In this Session