Dec 2, 2024
10:30am - 10:45am
Hynes, Level 2, Room 210
Dong Hyeon Mok1,Seoin Back1
Sogang University1
Recent advancements have shown that autoregressive language models can effectively generate inorganic crystal structures. Motivated by these advancements, we explored the potential of using a language model as a generative model for inorganic catalyst structures that include surface and adsorbate atoms, as well as a discovery tool for novel and promising electrocatalysts. We trained a language model based on the GPT architecture with 2M catalyst structures sourced from OC database of Meta, enabling the generation of string representations of catalyst structures. Since the validity metrics used in crystal structure generative models are not fully applicable to catalysts, we developed new validity metrics specialized for catalyst structures. The trained model struggles to prevent the generation of invalid structures containing overlapping atoms, while validly generated structures have high-quality. We addressed this issue by introducing a simple method to bypass overlapping atoms, which effectively prevented structurally invalid generations without compromising generation quality. Furthermore, we fine-tuned the model with our own data for the discovery of two-electron oxygen reduction reaction (2e-ORR) catalysts. Despite the relatively small size of the dataset (about 1,500 data points), the fine-tuned model successfully learned the intrinsic rules of the dataset and maintained high-quality catalyst generation. From the fine-tuned model, we discovered five novel and promising 2e-ORR catalyst candidates. In conclusion, our work not only highlights the autoregressive language model's robust catalyst generative performance but also its practical application in electrocatalyst design.