Dec 3, 2024
4:30pm - 4:45pm
Hynes, Level 2, Room 210
Tomasz Galica1,Matilda Sipilä1,Milica Todorović1
University of Turku1
Visualizing and mapping the relationships between materials is a challenging task due to both the large volume and complexity of material data. Extracting materials information from text using large language models (LLMs) is additionally complex, because text must be converted into numerical form. Conversion enables data comparison with similarity measures such as Euclidean distance but does not result in intuitively understandable high-dimensional vectors. Unsupervised learning techniques such as PCA and t-SNE can address dimensionality issues but often lead to less interpretable data distributions: data points may overlap or scatter too widely. To resolve these data visualization issues, alternative methods based on artificial forces and pseudo-physical simulations can be employed. Data point overlaps can be minimized using pseudo-electrostatic or pseudo-spring forces, and large empty spaces can be reduced with pseudo-gravity. In our study, we aim to map the material-property space of data extracted from scientific literature using LLMs to enable guided decision-making and a new approach to materials design [1].<br/><br/>We explored the ForceAtlas2 (FA2) algorithm [2] to represent chemical formulas of materials as graphs. We evaluated three attribute descriptors for vector representation and determined that MEGnet [3] maintains the most chemical information: on a small scale by clustering similar compounds together, and on a large scale by forming distinct groups of different compound classes. We studied FA2 parameters such as gravity strength, scale factor and number of iterations to obtain the most interpretable graphs. Our results indicate that force-based graphs can enhance the placement of data points compared with t-SNE, preventing data overlap, and minimizing empty spaces. This research demonstrates how alternative force-based methods can address challenges of visualizing materials and textual data. Our approach offers better interpretability of complex materials relationships, improving big data analysis and helping to guide decision-making and materials design.<br/><br/><br/>References:<br/>[1] Sipilä M., et al. arXiv:2405.15290, https://doi.org/10.48550/arXiv.2405.15290<br/>[2] Jacomy, M., et al. PLoS ONE 9, e98679 (2014). https://doi.org/10.1371/journal.pone.0098679<br/>[3] Chen, C., et al. Chem. Mater. 31, 9, 3564–3572 (2019). https://doi.org/10.1021/acs.chemmater.9b01294