Apr 8, 2025
5:00pm - 7:00pm
Summit, Level 2, Flex Hall C
Younsoo Kim1,Maciej Polak1,Dane Morgan1
University of Wisconsin-Madison1
Younsoo Kim1,Maciej Polak1,Dane Morgan1
University of Wisconsin-Madison1
Solid state electrolytes are promising for enabling multiple very promising emerging Li-ion and Na-ion battery technologies. One of the most important properties of the solid-state electrolyte is the ionic conductivity of the Li or Na in the material. In this work we developed databases and machine learning models for Li and Na ionic conductivity. We used Large Language Model (LLM) to automate extracting the data from the literature and scripts to clean the data to be ready for machine learning analysis. We then used elemental properties to featurize the compounds and built a machine learning model for Li and Na ionic conductivity at room temperature. We compared the performance of multiple algorithms and found that Random Forest approaches worked best. We achieved a MAE of 0.877 and 0.708 log units on the Li and Na conductivities, respectively. We explored the ability of models trained with Li and Na information to perform better than models with just one or the other but found little advantage to having both data sets together. Comparison of relative values for Li and Na for similar hosts showed very limited correlation and no particular bias for one element to diffuse faster. Additionally, we applied SHapley Additive explanation (SHAP) to identify the most important features for describing Li and Na ion conductivities and find that thermal expansion coefficient is the most important for both ions. We hypothesized that this correlation emerges from the fact that the highly flexible structural features that give rise to fast diffusion also give rise to large thermal expansion. Finally, we created a classifier to categorize the materials as good vs. bad ionic conductors based on whether they had the ion conductivity greater than 10-
4 S/cm. Our random forest model provided an area under the curve of 0.89 for the precision-recall curve and a maximum F1 score of 0.82 in case of Li, and the model of Na gives the results that area under the curve exhibits 0.96 and a maximum F1 score is 0.91, suggesting that it could effectively differentiate high conductivity materials. This work illustrates the power of integrating LLM data extraction with machine learning and we anticipate these approaches will be useful for various types of materials modeling and discovery.